Faculty of Computer Science

Research Group Theoretical Computer Science

Oberseminar: Heterogene formale Methoden

Date: 2023, July 27
Time: 14:00 p. m.
Place: online
Author: Smith, Sinan
Title: Ensembles for Molecule Classification (Master Thesis Defense)


Machine learning, having been used in a myriad of fields successfully for many years, has also been used in the cheminformatics field for tasks such as predicting chemical and biological properties from molecular structure. Mossakowski, Neuhaus, Glauer, Memariani and Hastings (2021) experimented with applying machine learning to the task of structure-based chemical ontology classification and found out that machine learning provided promising results for this particular task. Machine learning here is important due to the fact that chemical datasets are too large for manual classification into an ontology. Ensemble classifiers are a combination of different machine learning classifiers which offer an improved classification performance as compared to a single classifier. In this thesis, ensemble classifiers will be experimented with for structure-based chemical ontology classification to see whether the performance previously obtained can be improved. Initially, simple ensemble methods such as majority voting are experimented with. Better performances than the existing predictions’ performances are obtained using majority voting, owing to the diversity of the classifiers. Then, the complexity is notched up with weighted voting where the weights are dependent on the performance of the classifiers. Finally, the more advanced methods of bagging and boosting are experimented with. Regarding structure-based chemical ontology classification, further more specific research could experiment with ensemble methods that are more tailored to the problem domain, and different input encodings or different datasets could be experimented with in conjunction with ensembles. Possible titles of future papers are included in the future work section.

Keywords: Structure-based chemical ontology classification, chemical ontology, ChEBI, machine learning, ensemble classifier, output combination, majority voting, weighted voting, bagging, boosting, automated classification, LSTM

Back to the Oberseminar web page