Introduction/Background

Cancer is an ongoing health issue in our society without a significantly effective cure. It is well known that early detection of cancer may provide the patient with the best chance of beating it. Given that there are different types of cancers, each with their own symptoms, it can be difficult to detect some over others. Skin cancer is one that can be visually recognized. There are multiple different causes of skin cancer with different appearances (Actinic keratoses, basal cell carcinoma, melanoma, etc.).

Problem Definition

In order to accurately diagnose a patient, we must take into account numerous causes of skin cancer; however, this can prove difficult as some causes can look similar at times. It may be beneficial to utilize machine learning to classify different types of skin cancers. This can help aid doctors to more accurately diagnose skin cancer by either confirming their diagnosis or suggesting a re-diagnosis. Having a second opinion from a machine learning model can help eradicate many false positives and negatives. This can ensure that patients get the most effective treatment for their type of skin cancer.

Proposed Methods

We plan on using the HAM10000 dataset from Kaggle, a collection of about 10k images arranged in CSV files where each column is a sequential pixel. After choosing an image size, converting the data into a python dataframe, and splitting the data into a train and test set, we plan on creating two classification models. The first is a Convolutional Neural Network (CNN), which is used frequently with this dataset as seen in projects by Goyal and Rezvantalab. Since we are new to deep learning and the images are not very big, we plan on making a basic architecture with a small convolution (likely 3x3 to begin with) and a limited number of hidden layers. As we research more about CNNs, their activation functions, pooling layers, etc, we hope to improve our architecture to balance speed and accuracy better. We also want to explore Support Vector Machines (SVMs) for classification but since they are more binary-based classifiers, we plan to train them to recognize one specific type of disease category and say whether an image depicts that specific disease or not.

HAM10000 data examples
Source: Kaggle.com
Potential Results

Since our dataset has 7 different types of skin cancer images, the goal of the model will be to display results as numbers 0-6, corresponding to the type of cancer the image is diagnosed as. We expect to analyze the results using methods such as accuracy and recall to test how closely our model was able to predict the type of cancer. Since there are features such as location of lesions and age included in the dataset, we hope to find out the features that impact the results the most, which would provide healthcare workers with a list of traits that make someone more susceptible to developing skin cancer. This would ideally help them warn those who are at more risk for a particular type of skin cancer based on their medical history or other factors.

Proposed Timeline
October 12

Convert HAM10000 dataset into python dataframe for processing

October 19

Finish test/train split

November 2

Formal checkpoint - CNN Classification Model

November 16

Project Midterm Report

November 30

Both classification models complete/Programming work effectively finished

December 7

Project Final Report Due

Responsibilities
References

Goyal, M., Hassanpour, S., & Yap, M. H. (n.d.). Region of Interest Detection in Dermoscopic Images for Natural Data-augmentation. ArXiv. https://doi.org/https://arxiv.org/pdf/1807.10711.pdf

Rezvantalab, A., Safigholi, H., & Karimijeshni, S. (n.d.). Dermatologist Level Dermoscopy Skin Cancer Classification Using Different Deep Learning Convolutional Neural Networks Algorithms . ArXiv. https://doi.org/https://arxiv.org/pdf/1810.10348.pdf

Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with Deep Neural Networks. Nature, 542(7639), 115–118. https://doi.org/10.1038/nature21056