-
Notifications
You must be signed in to change notification settings - Fork 0
JenHauen16/Dataset_merge
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Merging 2 Datasets based on protein changes This script was designed to merge two datasets from two sequencing assays: a Foundation Medicine assay and TruSight Oncology 500. The Python script synchronizes the protein change format between the two datasets and parses the protein change information to merge the datasets on. Other information such as amplifications, tumor mutation burden, and microsatellite status are also parsed from the datasets. Pandas is used for the data clean-up and merging and Openpyxl is used to write the output to an Excel file and to create a more visually appealing appearance. Due to the irregularity of the datasets, 2 functions are used in order to manually clean up some of the data that could not be regularly accounted for and identify rows where specific data stopped and started.
About
Merge on protein change information from 2 NGS assays
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published