Decoding Fastai Midlevel APIs
Harnessing Fastai's Mid Level APIs for better data processing.
- Data Gathering
- Using the DataBlock API
- Using Fastai Transforms
- Setting up the internal state with a setups
- Combining our Transform with data augmentation in a Pipeline
- TfmdLists and Datasets
Processing your data is one of the most time-consuming tasks in building any ML model. Fastai provides elegant ways to solve this problem of data processing by making the process almost too easy.
We have already seen the DataBlock API
as explained in my post, Building an image classifier using Fastai which in itself is an already streamlined way to work with most types of datasets in a beautifully simple manner.
In this blog post, we'll be looking at how this DataBlock API came about and what happens under the hood by using the various Mid Level APIs Fastai provides for the purpose of data processing.
For this post, I'll be working the APTOS 2019 Blindness Detection Challenge which is a Kaggle competitition to predict and prevent onset of Diabetic Retinopathy in pateints by predicting the degree of severity of the disease among patients.
#collapse
dataset_path = '/media/harish3110/AE2461B824618465/datasets/aptos_blindness_detection'
path = Path(dataset_path)
Path.BASE_PATH = path
train = pd.read_csv(path/'train.csv')
train.head()
We can see that the diagnosis is numerically labelled which isn't quite useful so lets create a dictionary that can be mapped to these values to get what each value actaully represents.
label_dict = {
0: 'No DR',
1: 'Mild',
2: 'Moderate',
3: 'Severe',
4: 'Proliferative DR'
}
#collapse
item_tfms=Resize(480),
batch_tfms=aug_transforms(size=224, min_scale=0.75)
# Fastai's presizing trick
dblock = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_x=ColReader('id_code', pref=path/'train_images', suff='.png'),
get_y=Pipeline([ColReader('diagnosis'), label_dict.__getitem__]),
splitter=RandomSplitter(seed=42),
item_tfms = item_tfms,
batch_tfms = batch_tfms
)
dls = dblock.dataloaders(train)
dls.show_batch(max_n=3)
dsets = dblock.datasets(train)
dsets[0]
dblock.summary(train.iloc[:10])
The aim is try to recreate this same dataloader using Fastai's Mid Level APIs na dunderstand all that is going on behind the scenes and tools Fastai provides to make this process better!
In my previous post Introduction to NLP using Fastai, I introduce the concept of Transforms
in Fastai as an almost reversible function which is the basic block for processing data in the Fastai library.
This reversibility of a transoform is especially useful when trying to perform different types of data transformations needed before we can batch them and pass them through a dataloader. We saw trasnforms like Tokenization
and Numericalization
for making text data ready and similarly for an image classification like the one we have at hand we would need to do sequence of transformations on our images like resizing them, performing certain data augmentations and maybe converting them to tensors for training a PyTorch model.
class TitledImage(Tuple):
def show(self, ctx=None, **kwargs):
show_titled_image(self, ctx=ctx, **kwargs)
class Tfm(ItemTransform):
def __init__(self, vocab, o2i, lblr):
self.vocab, self.o2i, self.lblr = vocab,o2i,lblr
def encodes(self, o):
return (PILImage.create(ColReader('id_code', pref=path/'train_images', suff='.png')(o)), self.o2i[self.lblr(o)])
def decodes(self, x):
return TitledImage(x[0],self.vocab[x[1]])
labeller = Pipeline([attrgetter('diagnosis'), label_dict.__getitem__])
vals = list(map(label_dict.__getitem__, list(train['diagnosis'].values)))
vocab,o2i = uniqueify(vals, sort=True, bidir=True)
aptos = Tfm(vocab,o2i,labeller)
train.iloc[0]
x,y = aptos(train.iloc[0])
x.shape,y
dec = aptos.decode(aptos(train.iloc[0]))
dec.show()
We can now make our Transform class automatically state its state from the data. This way, when we combine together our Transform
with the data, it will automatically get setup without having to do anything. This is done by adding a setups
method in the Transform definition
class Tfm(ItemTransform):
def setups(self, items):
self.labeller = Pipeline([attrgetter('diagnosis'), label_dict.__getitem__])
vals = list(map(label_dict.__getitem__, list(train['diagnosis'].values)))
self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)
def encodes(self, o):
print(o)
return (PILImage.create(ColReader('id_code', pref=path/'train_images', suff='.png')(o)), self.o2i[self.labeller(o)])
def decodes(self, x):
return TitledImage(x[0], self.vocab[x[1]])
aptos = Tfm()
aptos.setup(train)
x,y = aptos(train.iloc[0])
x.shape, y
dec = aptos.decode((x,y))
dec.show()
We can also take advantage of fastai's data augmentation transforms if we give the right type to our elements. Instead of returning a standard PIL.Image
, if our transform returns the fastai type PILImage
, we can then use any fastai's transform with it. Let's just return a PILImage
for our first element:
class Tfm(ItemTransform):
def setups(self, items):
self.labeller = Pipeline([attrgetter('diagnosis'), label_dict.__getitem__])
vals = list(map(label_dict.__getitem__, list(train['diagnosis'].values)))
self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)
def encodes(self, o):
return (PILImage.create(ColReader('id_code', pref=path/'train_images', suff='.png')(o)), self.o2i[self.labeller(o)])
def decodes(self, x):
return TitledImage(x[0], self.vocab[x[1]])
We can then combine that transform with ToTensor, Resize or FlipItem to randomly flip our image in a Pipeline:
tfms = Pipeline([Tfm(), Resize(224), FlipItem(p=1), ToTensor()])
tfms
Calling setup on a Pipeline will set each transform in order:
tfms.setup(train)
tfms.vocab
x,y = tfms(train.iloc[0])
x.shape,y
We can see ToTensor and Resize were applied to the first element of our tuple (which was of type PILImage) but not the second. We can even have a look at our element to check the flip was also applied:
tfms.show(tfms(train.iloc[0]))
Creating a TfmdLists just requires a list of items and a list of transforms that will be combined in a Pipeline:
class Tfm(ItemTransform):
def setups(self, items):
self.labeller = Pipeline([ColReader('diagnosis'), label_dict.__getitem__])
vals = map(self.labeller, items)
self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)
def encodes(self, o):
return (PILImage.create(ColReader('id_code', pref=path/'train_images', suff='.png')(o)), self.o2i[self.labeller(o)])
def decodes(self, x):
return TitledImage(x[0],self.vocab[x[1]])
# Used to create separate train and validation transformed lists when creating TfmdLists
splitter = RandomSplitter(seed=42)
splits = splitter(train)
splits
# Presizing to 480
tls = TfmdLists(train, [Resize(480), Tfm(), ToTensor()], splits=splitter(train))
Note: TfmdLists calls setups
on each of transforms provided above and thus need not be called again.
x,y = tls[0]
x.shape,y
tls.vocab
tls.show((x,y))
# Another method to show
show_at(tls, 0)
dls = tls.dataloaders(bs=64)
dls.vocab
dls.show_batch(max_n=3)
You can even add augmentation transforms, since we have a proper fastai typed image. Just remember to add the IntToFloatTensor
transform that deals with the conversion of int to float (augmentation transforms of fastai on the GPU require float tensors). When calling TfmdLists.dataloaders
, you pass the batch_tfms
to after_batch
(and potential new item_tfms
to after_item
):
dls = tls.dataloaders(bs=64, after_batch=[IntToFloatTensor(), *aug_transforms(size=224, min_scale=0.75)])
dls.show_batch(max_n=3)
Datasets applies a list of list of transforms (or list of Pipelines) lazily to items of a collection, creating one output per list of transforms/Pipeline. This makes it easier for us to separate out steps of a process, so that we can re-use them and modify the process more easily. This is what lays the foundation of the data block API: we can easily mix and match types as inputs or outputs as they are associated to certain pipelines of transforms.
x_tfms = [ColReader('id_code', pref=path/'train_images', suff='.png'), PILImage.create]
y_tfms = [ColReader('diagnosis'), label_dict.__getitem__,Categorize()]
tfms = [x_tfms, y_tfms]
dsets = Datasets(train, tfms, splits=splits)
dls = dsets.dataloaders(bs=64,
after_item=[Resize(480), ToTensor(), IntToFloatTensor()],
after_batch=[Normalize.from_stats(*imagenet_stats)])
dsets[0]
dls.show_batch(max_n=3)