Bridging Molecular Biology and Data Science

Hello! I am passionate about the intersection of data science and biochemistry. My adoption from China drew me to seek ancestry DNA testing, and it opened my eyes to the informative power of genomic data science. My background is in synthetic biology/genetic engineering research & molecular biology operations, and I got my B.S. Biochemstry with a minor in Chinese from UT Austin.

Currently, I am a MS Data Science student at NYU's Center for Data Science. I am on leadership for a number of CDS organizations, including Women in Data Science, Graduate Community Building group, and CDS Admissions Ambassadors. I am always happy to discuss NYU's MSDS program or navigating a transition into Data Science.

Previously, I worked as a Research Technician for a women's health biotech. When I began, I mostly focused on sample management and operations. Through self-advocacy, I grew to balance those responsibilities with database management, Python/SQL scripting, creating visualizations, and even conducting interviews!

I am excited about rock climbing, dancing, sharing my adoptee experience, building community, film photography, and applying data science to advance human health!

Skills

  • 2 years of Python (Pandas, Numpy)
  • 2 years of SQL (MySQL, Snowflake)
  • Machine Learning (regression, classification, supervised learning, SHAP)
  • Natural Language Processing (SpaCy, TweetNLP)
  • Data Visualization (MatPlotLib, Plotly, Seaborn, Dash, LookerStudio)
  • Process Improvement
  • Training, Writing SOPs
  • Conducting Interviews
  • DNA Isolation, Next Generation Sequencing, ddPCR
  • Research project planning and design

Data Projects

Python Script to Automate and Expand Data Uploads

Developed Python ETL scripts to automate the aggregation, preprocessing, and validation of external vendor data for upload in Snowflake. Reduced time to upload by 30x. Trained 20+ non-technical employees over 6 sessions on how to use the script.

Jan — Jul 2024

Regression Model to Predict JIRA Ticket Turnaround Time

Trained RF model to identify key features in 700+ JIRA sample requests, to derive insight on which factors most significantly impact turnaround time (TAT). Presented the project to a non-technical audience, recommending process changes to enhance JIRA workflows and increase reporting accuracy.

April — Jul 2024

Analysis of Halted Clinical Trials

Analyzed ~13K clinical trial records in Python, creating an interactive dashboard with Dash and Plotly. Published reports on Medium and LinkedIn.

August — November 2023

Medium | LinkedIn | GitHub

NLP Analysis of Generative AI Tweets

Sentiment analysis and entity recognition on 56K tweets about generative AI from April 2022 - April 2023.

December 2023 — Feb 2024 [ON HOLD]

GitHub

Image generated with AI through Microsoft Bing

Synthetic Biology Research

I was an Undergraduate Research Assistant for the Keitz Group at the University of Texas at Austin, from January 2019 - May 2022.

Engineering a Tunable Fe(II)-Responsive Two-Component System in Shewanella oneidensis

Independent research project, presented at the Fall Undergraduate Research Symposium. Awarded second place in synthetic biology category.
Please contact me if you would like to view the project abstract or presentation materials.

Transcriptional Regulation of Synthetic Polymer Networks

Austin J. Graham, Christopher M. Dundas, Gina Partipilo, Ismar E. Miniel Mahfoud, Thomas FitzSimons, Rebecca Rinehart, Darian Chiu, Avery E. Tyndall, Adrianne M. Rosales, Benjamin K. Keitz bioRxiv 2021.10.17.464678; doi: https://doi.org/10.1101/2021.10.17.464678

Film Photography