I've parsed out a very large TCGA cancer ssm (single mutation file) file to give me the essential information.
The ssm is in the following format:
['Gene name', 'Ensembl Gene ID', 'Chromosome', 'Chromosome start', 'Cancer Type']
['NTRK1', 'ENSG00000198400', '1','156849827', 'Prostate Adenocarcinoma (TCGA, US)']
From there I would like to grab each mutation and :
- Map the chromosomal position to a known SNP (rs something output).
- See if this snp is found in a 3'UTR
- See if this snp is found in a miRNA
- Missense or sense mutation
- Any relevant genbank etc.. ids
I'd like to do this Python (I think BioPython is suited for this) for downstream applications.
No comments:
Post a Comment