Jul 16, 2012

Analyzing SNP data

I hope to be starting on SNP data analyses in the earnest today. I am attending a workshop on data-cleaning and quality control of genetic data this morning. I then have a meeting to see what data I can get for analyses.

The first step?

Chucked out the evil spawn of linux known as Fedora and installed Ubuntu. And what a difference in display quality! Now for the brass tacks

  1. Installed PLINK 1.07-1. Ubuntu Software Center > PLINK
  2. Can whatever PLINK does be done using R alone?
The baseline variable shipped along with the SNPs include:
  1. StudyID
  2. Case_Control
  3. age {at reference date?}
  4. A5 {Hispanic ethnicity?}
  5. A7 {Highest grade of school completed}
  6. D1 {Age at menarche}
  7. menopause {Menopausal status}
  8. agemenopause {Age at menopause}
  9. bmi
  10. tanita_bmi {Tanita scale Body Mass Index (kg/m^2)}
  11. bmi_self
  12. agefirstbirth {Age at first live birth}
  13. birthcount {Number of live births}
  14. agefirstpreg {Age at first full-term pregnancy}
  15. pregnum {Number of full-term pregancies}
breastfeed breastfeedmonths G8 F1 waist hip waisthip race eversmok calories carbo fat protein fiber fibh2o fibinso F5 fhbc Hmm what are these...
  1. phenotype
  2. state
  3. control_source

Outcomes

  1. ERStatus
  2. PRStatus
  3. her2status
  4. TumorGrade
  5. TumorStage
  6. invasive
  7. tumor_subtype
  8. triplenegative
  9. Eurpr

0 Comments:

Post a Comment