Using R to analyze data from 500 patients with lung squamous cell cancer, which proteins have high expressions that correlate with low survival rates?


Arushi A.


This research project will utilize R to find evidence in data that overexpression of a protein induces progression of lung squamous cell cancer. The Cancer Genome Atlas (TCGA) has a dataset of 500 patients through which can correlate survival rate with protein expression. Since high protein expression can involve copy number variations (amplifications/deletions), mutations, and methylation, this research project will be looking at many different datasets and aggregating the findings together. After finding statistically significant proteins, the next step will be to find literature supporting the possibility that this protein plays a role in cancer. Combining the numerical and written evidence will provide a strong argument for a researcher to invest time and money in this protein.