In June of 2017, Intel partnered with MobileODT to challenge Kagglers to develop an algorithm with tangible, real-world impact–accurately identify a woman’s cervix type in images. This is really important because assigning effective cervical cancer treatment depends on the doctor’s ability to accurately do this. While cervical cancer is easy to prevent if caught in its pre-cancerous stage, many doctors don’t have the skills to reliably discern the appropriate treatment.
In this winners’ interview, first place team, ‘Towards Empirically Stable Training’ shares insights into their approach like how it was important to invest in creating a validation set and why they developed bounding boxes for each photo.
What was your background prior to entering this challenge?
Ignas Namajūnas (bobutis) – Mathematics BSc, Computer Science MSc and 3 years of R&D work, including around 9 months of being the research lead for a surveillance project.
Jonas Bialopetravičius (zadrras) – Software Engineering BSc, Computer Science MSc, 7 years of professional experience in computer vision and ML, currently studying astrophysics where I also apply deep learning methods.
Darius Barušauskas (raddar) – BSc & MSc in Econometrics, 7 years of ML applications in various fields, such as finance, telcos, utilities.
Do you have any prior experience or domain knowledge that helped you succeed in this competition?
We have a lot of experience in training object detectors. Additionally, Jonas and Ignas have won a previous deep learning competition – The Nature Conservancy Fisheries Monitoring Competition; It required similar know-how, therefore it could be easily transferred to this task.
How did you get started competing on Kaggle?
We saw Kaggle as an opportunity to apply our knowledge and skills obtained in our daily jobs to other fields as well. We also saw a lot of opportunity to learn from the great Machine Learning community the Kaggle platform has.
What made you decide to enter this competition?
The importance of this problem and the fact that it could be approached as object detection, where we already had success in a previous competition.
Let’s get technical:
Did any past research or previous competitions inform your approach?
We have been using Faster R-CNN in many tasks we have done so far. We believe that by tuning the right details it can be adapted to quite different problems.
What preprocessing steps have you done?
Since we had a very noisy dataset, we spent lots of time manually looking at the given data. We noticed, that the additionally provided dataset had many blurry and non-informative photos. We discarded large portion of them (roughly 15%). We also hand labeled photos by creating bounding boxes with regions of interest in each photo (both original dataset and additional dataset). This was essential for our methods to work and it helped a lot during model training.
What supervised learning methods did you use?
We used a few different variants of Faster R-CNN models with VGG-16 feature extractors. In the end, we ended up with 4 models which we ensembled. These models also had complementary models for generating bounding boxes on the public test set and night-vision-like image detection. Some of these 4 models alone were enough to place us 1st.
What was your most important insight into the data?
A proper validation scheme was super important. We noticed that the additional dataset had many similar photos as in the original training set, which itself caused problems if we wanted to use additional data in our models. Therefore, we applied K-means clustering to create a trustworthy validation set. We clustered all the photos into 100 clusters and took 20 random clusters as our validation set. This helped us track if data augmentations we used in our models were useful or not.
We also saw that augmenting the red color channel was critical, therefore we used a few different red color augmentations in our models.
Having two datasets with differing quality, we also experimented with undersampling the additional dataset. We found out that keeping the original:additional dataset image count ratio to 1:2 was optimal (in contrast to a ratio of 1:4, if no undersampling was applied).
Were you surprised by any of your findings?
From manual inspection, it seems that different types of cancerous cervixes had differing blood patterns. So focusing on blood color in the photos seemed logical.
Which tools did you use?
We used our customized R-FCN (which also includes Faster R-CNN). Original version can be obtained at https://github.com/Orpine/py-R-FCN.
How did you spend your time on this competition?
The first few days were dedicated to creating image bounding boxes and thinking of how to construct a proper validation set. After that we kept our GPU’s running non-stop while discussing which data augmentations we should try.
What does your hardware setup look like?
We had 2 GTX1080 and 1 GTX980 for model training. The whole ensemble takes 50 hours to train and it takes 7-10 seconds for single image inference. Our best single model takes 8 hours to train, 0.7 seconds for image inference.
Words of wisdom:
Many different problems could be tackled using the same DL algorithms. If a problem can be interpreted as an image detection problem, detecting fish types or certain cervix types becomes somewhat equivalent, even though knowing which details to tune for each problem might be very important.
How did your team form?
We have been colleagues and acquaintances for a long time. On top of that, we are a part of larger team, aiming to solve medical tasks with computer vision and deep learning.
How did your team work together?
We were using slack for communication and had a few meetings as well.
How did competing on a team help you succeed?
It was much easier as we could split roles. Darius worked on image bounding boxes and setting up the validation, Jonas worked on the codebase, and Ignas was brainstorming which data augmentations to test.
Just for fun:
If you could run a Kaggle competition, what problem would you want to pose to other Kagglers?
Given the patient medical records, X-rays, ultrasounds, etc. predict which disease a patient is likely to suffer in the future. Combining different sources of information sounds like an interesting challenge.
What is your dream job?
Creating deep-learning based software for doctors to assist them in faster and more accurate decisions and more efficient patient treatment.