Improving Antibody CDR Template Selection by Structural Cluster Prediction

dc.contributor.advisorGray, Jeffrey J.
dc.contributor.committeeMemberSchulman, Rebecca
dc.creatorLong, Xiyao
dc.date.accessioned2018-05-22T03:50:57Z
dc.date.available2018-05-22T03:50:57Z
dc.date.created2017-12
dc.date.issued2017-12-28
dc.date.submittedDecember 2017
dc.date.updated2018-05-22T03:50:58Z
dc.description.abstractWith the advent of high-throughput sequencing, antibody sequences can be acquired at much greater speed than corresponding structures, creating a need for rapid structure determination. Computational modeling is the only feasible method for high-throughput structure determination, however it does not always produce models with high accuracy. In antibody modeling, the framework regions are well conserved and readily modeled to sub-Angstrom accuracy, but accurate modeling of the complementarity determining region (CDR) loops remains elusive. This is a challenge we must overcome if we are to study antibody function or design an antibody, using models. Of the six CDR loops, the non-H3 CDR loops (H1, H2, and L1–L3) are easier to model than the H3 loop, because they are shorter and have less structural and length variability. Moreover, most of the non-H3 CDR loop structures can be grouped by CDR and length and can be clustered into a few canonical structure clusters. The ability to accurately predict the correct cluster of a CDR from sequence alone could improve structural modeling. In this thesis, I assessed how well current modeling techniques can identify the CDR canonical structures from sequence alone and I improved the retrieval accuracy. First, I benchmarked the current CDR loop modeling method in Rosetta and found it failed to predict the correct canonical structure clusters for 19% of CDRs. Next, I assessed the significance of the failures by comparing to a random cluster selection model. Then, to improve the accuracy of template selection, I trained a machine learning classifier, for each CDR and length group, with sequences as features, and found that the classifier successfully improved the retrieval of canonical structures. This improvement is not achievable by the residue position rules alone. Finally, I propose incorporating canonical class prediction via machine learning to improve canonical structure retrieval accuracy and I expected this improvement to increase as the less populated CDR clusters become more enriched.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/58703
dc.language.isoen_US
dc.publisherJohns Hopkins University
dc.publisher.countryUSA
dc.subjectAntibody
dc.subjectcomplementary determining regions
dc.subjectCDRs
dc.subjectRosetta Antibody
dc.subjectprotein structural modeling
dc.titleImproving Antibody CDR Template Selection by Structural Cluster Prediction
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentChemical and Biomolecular Engineering
thesis.degree.disciplineChemical & Biomolecular Engineering
thesis.degree.grantorJohns Hopkins University
thesis.degree.grantorWhiting School of Engineering
thesis.degree.levelMasters
thesis.degree.nameM.S.E.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LONG-THESIS-2017.pdf
Size:
10.34 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.67 KB
Format:
Plain Text
Description: