Processing your data is one of the most time-consuming tasks in building any ML model. Fastai provides elegant ways to solve this problem of data processing by making the process almost too easy.

We have already seen the DataBlock API as explained in my post, Building an image classifier using Fastai which in itself is an already streamlined way to work with most types of datasets in a beautifully simple manner.

In this blog post, we'll be looking at how this DataBlock API came about and what happens under the hood by using the various Mid Level APIs Fastai provides for the purpose of data processing.

Data Gathering

For this post, I'll be working the APTOS 2019 Blindness Detection Challenge which is a Kaggle competitition to predict and prevent onset of Diabetic Retinopathy in pateints by predicting the degree of severity of the disease among patients.

#collapse
dataset_path = '/media/harish3110/AE2461B824618465/datasets/aptos_blindness_detection'

path = Path(dataset_path)
Path.BASE_PATH = path

train = pd.read_csv(path/'train.csv')
train.head()
id_code diagnosis
0 000c1434d8d7 2
1 001639a390f0 4
2 0024cdab0c1e 1
3 002c21358ce6 0
4 005b95c28852 0

We can see that the diagnosis is numerically labelled which isn't quite useful so lets create a dictionary that can be mapped to these values to get what each value actaully represents.

label_dict = {
    0: 'No DR', 
    1: 'Mild', 
    2: 'Moderate', 
    3: 'Severe', 
    4: 'Proliferative DR'
}

Using the DataBlock API

#collapse
item_tfms=Resize(480),
batch_tfms=aug_transforms(size=224, min_scale=0.75)
# Fastai's presizing trick 

dblock = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_x=ColReader('id_code', pref=path/'train_images', suff='.png'),
    get_y=Pipeline([ColReader('diagnosis'), label_dict.__getitem__]), 
    splitter=RandomSplitter(seed=42),
    item_tfms = item_tfms,
    batch_tfms = batch_tfms
)

dls = dblock.dataloaders(train) 

dls.show_batch(max_n=3)
dsets = dblock.datasets(train)
dsets[0]
(PILImage mode=RGB size=3216x2136, TensorCategory(1))
dblock.summary(train.iloc[:10])
Setting-up type transforms pipelines
Collecting items from         id_code  diagnosis
0  000c1434d8d7          2
1  001639a390f0          4
2  0024cdab0c1e          1
3  002c21358ce6          0
4  005b95c28852          0
5  0083ee8054ee          4
6  0097f532ac9f          0
7  00a8624548a9          2
8  00b74780d31d          2
9  00cb6555d108          1
Found 10 items
2 datasets of sizes 8,2
Setting up Pipeline: ColReader -> PILBase.create
Setting up Pipeline: ColReader -> dict.__getitem__ -> Categorize

Building one sample
  Pipeline: ColReader -> PILBase.create
    starting from
      id_code      001639a390f0
diagnosis               4
Name: 1, dtype: object
    applying ColReader gives
      /media/harish3110/AE2461B824618465/datasets/aptos_blindness_detection/train_images/001639a390f0.png
    applying PILBase.create gives
      PILImage mode=RGB size=3216x2136
  Pipeline: ColReader -> dict.__getitem__ -> Categorize
    starting from
      id_code      001639a390f0
diagnosis               4
Name: 1, dtype: object
    applying ColReader gives
      4
    applying dict.__getitem__ gives
      Proliferative DR
    applying Categorize gives
      TensorCategory(3)

Final sample: (PILImage mode=RGB size=3216x2136, TensorCategory(3))


Setting up after_item: Pipeline: Resize -> ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -> AffineCoordTfm -> RandomResizedCropGPU -> LightingTfm

Building one batch
Applying item_tfms to the first sample:
  Pipeline: Resize -> ToTensor
    starting from
      (PILImage mode=RGB size=3216x2136, TensorCategory(3))
    applying Resize gives
      (PILImage mode=RGB size=480x480, TensorCategory(3))
    applying ToTensor gives
      (TensorImage of size 3x480x480, TensorCategory(3))

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch

Applying batch_tfms to the batch built
  Pipeline: IntToFloatTensor -> AffineCoordTfm -> RandomResizedCropGPU -> LightingTfm
    starting from
      (TensorImage of size 4x3x480x480, TensorCategory([3, 1, 2, 3], device='cuda:0'))
    applying IntToFloatTensor gives
      (TensorImage of size 4x3x480x480, TensorCategory([3, 1, 2, 3], device='cuda:0'))
    applying AffineCoordTfm gives
      (TensorImage of size 4x3x480x480, TensorCategory([3, 1, 2, 3], device='cuda:0'))
    applying RandomResizedCropGPU gives
      (TensorImage of size 4x3x224x224, TensorCategory([3, 1, 2, 3], device='cuda:0'))
    applying LightingTfm gives
      (TensorImage of size 4x3x224x224, TensorCategory([3, 1, 2, 3], device='cuda:0'))

The aim is try to recreate this same dataloader using Fastai's Mid Level APIs na dunderstand all that is going on behind the scenes and tools Fastai provides to make this process better!

Using Fastai Transforms

In my previous post Introduction to NLP using Fastai, I introduce the concept of Transforms in Fastai as an almost reversible function which is the basic block for processing data in the Fastai library.

This reversibility of a transoform is especially useful when trying to perform different types of data transformations needed before we can batch them and pass them through a dataloader. We saw trasnforms like Tokenization and Numericalization for making text data ready and similarly for an image classification like the one we have at hand we would need to do sequence of transformations on our images like resizing them, performing certain data augmentations and maybe converting them to tensors for training a PyTorch model.

class TitledImage(Tuple):
    def show(self, ctx=None, **kwargs): 
        show_titled_image(self, ctx=ctx, **kwargs)
class Tfm(ItemTransform):
    def __init__(self, vocab, o2i, lblr): 
        self.vocab, self.o2i, self.lblr = vocab,o2i,lblr
    def encodes(self, o): 
        return (PILImage.create(ColReader('id_code', pref=path/'train_images', suff='.png')(o)), self.o2i[self.lblr(o)])
    def decodes(self, x): 
        return TitledImage(x[0],self.vocab[x[1]])
labeller = Pipeline([attrgetter('diagnosis'), label_dict.__getitem__])
vals = list(map(label_dict.__getitem__, list(train['diagnosis'].values)))
vocab,o2i = uniqueify(vals, sort=True, bidir=True)

aptos = Tfm(vocab,o2i,labeller)
train.iloc[0]
id_code      000c1434d8d7
diagnosis               2
Name: 0, dtype: object
x,y = aptos(train.iloc[0])
x.shape,y
((2136, 3216), 1)
dec = aptos.decode(aptos(train.iloc[0]))
dec.show()

Setting up the internal state with a setups

We can now make our Transform class automatically state its state from the data. This way, when we combine together our Transform with the data, it will automatically get setup without having to do anything. This is done by adding a setups method in the Transform definition

class Tfm(ItemTransform):
    def setups(self, items):
        self.labeller = Pipeline([attrgetter('diagnosis'), label_dict.__getitem__])
        vals = list(map(label_dict.__getitem__, list(train['diagnosis'].values)))
        self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)

    def encodes(self, o): 
        print(o)
        return (PILImage.create(ColReader('id_code', pref=path/'train_images', suff='.png')(o)), self.o2i[self.labeller(o)])
    def decodes(self, x): 
        return TitledImage(x[0], self.vocab[x[1]])
aptos = Tfm()
aptos.setup(train)
x,y = aptos(train.iloc[0])
x.shape, y
id_code      000c1434d8d7
diagnosis               2
Name: 0, dtype: object
((2136, 3216), 1)
dec = aptos.decode((x,y))
dec.show()

Combining our Transform with data augmentation in a Pipeline

We can also take advantage of fastai's data augmentation transforms if we give the right type to our elements. Instead of returning a standard PIL.Image, if our transform returns the fastai type PILImage, we can then use any fastai's transform with it. Let's just return a PILImage for our first element:

class Tfm(ItemTransform):
    def setups(self, items):
        self.labeller = Pipeline([attrgetter('diagnosis'), label_dict.__getitem__])
        vals = list(map(label_dict.__getitem__, list(train['diagnosis'].values)))
        self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)

    def encodes(self, o): 
        return (PILImage.create(ColReader('id_code', pref=path/'train_images', suff='.png')(o)), self.o2i[self.labeller(o)])
    def decodes(self, x): 
        return TitledImage(x[0], self.vocab[x[1]])

We can then combine that transform with ToTensor, Resize or FlipItem to randomly flip our image in a Pipeline:

tfms = Pipeline([Tfm(), Resize(224), FlipItem(p=1), ToTensor()])
tfms
Pipeline: Tfm -> FlipItem -> Resize -> ToTensor

Calling setup on a Pipeline will set each transform in order:

tfms.setup(train)
tfms.vocab
(#5) ['Mild','Moderate','No DR','Proliferative DR','Severe']
x,y = tfms(train.iloc[0])
x.shape,y
(torch.Size([3, 224, 224]), 1)

We can see ToTensor and Resize were applied to the first element of our tuple (which was of type PILImage) but not the second. We can even have a look at our element to check the flip was also applied:

tfms.show(tfms(train.iloc[0]))

TfmdLists and Datasets

Using TfmdLists

One pipeline makes a TfmdLists

Creating a TfmdLists just requires a list of items and a list of transforms that will be combined in a Pipeline:

class Tfm(ItemTransform):
    def setups(self, items):
        self.labeller = Pipeline([ColReader('diagnosis'), label_dict.__getitem__])
        vals = map(self.labeller, items)
        self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)

    def encodes(self, o): 
        return (PILImage.create(ColReader('id_code', pref=path/'train_images', suff='.png')(o)), self.o2i[self.labeller(o)])
    def decodes(self, x): 
        return TitledImage(x[0],self.vocab[x[1]])
# Used to create separate train and validation transformed lists when creating TfmdLists
splitter = RandomSplitter(seed=42)
splits = splitter(train)
splits
((#2930) [3578,3510,791,2745,707,3395,2294,2868,3118,2426...],
 (#732) [1506,2110,558,1514,3640,3184,1354,3061,2514,276...])
# Presizing to 480
tls = TfmdLists(train, [Resize(480), Tfm(), ToTensor()], splits=splitter(train))

Note: TfmdLists calls setups on each of transforms provided above and thus need not be called again.

x,y = tls[0]
x.shape,y
(torch.Size([3, 480, 480]), 1)
tls.vocab
(#5) ['Mild','Moderate','No DR','Proliferative DR','Severe']
tls.show((x,y))
# Another method to show
show_at(tls, 0)
dls = tls.dataloaders(bs=64)
dls.vocab
(#5) ['Mild','Moderate','No DR','Proliferative DR','Severe']
dls.show_batch(max_n=3)

You can even add augmentation transforms, since we have a proper fastai typed image. Just remember to add the IntToFloatTensor transform that deals with the conversion of int to float (augmentation transforms of fastai on the GPU require float tensors). When calling TfmdLists.dataloaders, you pass the batch_tfms to after_batch (and potential new item_tfms to after_item):

dls = tls.dataloaders(bs=64, after_batch=[IntToFloatTensor(), *aug_transforms(size=224, min_scale=0.75)])
dls.show_batch(max_n=3)

Using Datasets

Datasets applies a list of list of transforms (or list of Pipelines) lazily to items of a collection, creating one output per list of transforms/Pipeline. This makes it easier for us to separate out steps of a process, so that we can re-use them and modify the process more easily. This is what lays the foundation of the data block API: we can easily mix and match types as inputs or outputs as they are associated to certain pipelines of transforms.

x_tfms = [ColReader('id_code', pref=path/'train_images', suff='.png'), PILImage.create]
y_tfms = [ColReader('diagnosis'), label_dict.__getitem__,Categorize()]                          
tfms = [x_tfms, y_tfms]

dsets = Datasets(train, tfms, splits=splits)

dls = dsets.dataloaders(bs=64, 
                        after_item=[Resize(480), ToTensor(), IntToFloatTensor()], 
                        after_batch=[Normalize.from_stats(*imagenet_stats)])
dsets[0]
(PILImage mode=RGB size=3216x2136, TensorCategory(1))
dls.show_batch(max_n=3)