ACS Fall 2023
Demo paper

Expanded Dataset for Improved Prediction of Chemical Biodegradability


Biodegradability is a crucial factor in assessing the long-term impact of chemicals on the environment. However, experimental testing to determine biodegradability is time-consuming and laborious. To address this issue, in silico approaches such as quantitative structure-activity relationship (QSAR) models are highly encouraged by legislators. European legislators have incorporated chemical persistency in the Registration, Evaluation, and Authorization of Chemicals (REACH) for the assessment of chemicals. However, only 61% of chemicals produced or imported in quantities of over 1000 tons per year have information on biodegradability. As a potential solution, REACH encourages the use of QSAR models to predict the biodegradability of compounds. To encourage the development of QSAR models to predict the biodegradability of compounds, this work extends the "All-Public set," which is an aggregated dataset with information on 2830 compounds from various sources. In this study, we contribute to this dataset by adding information on the biodegradability of 3707 new compounds from the ECHA database, resulting in a larger dataset with the biodegradability information of 6537 compounds. By providing a larger dataset with biodegradability information, we aim to promote the development of more accurate QSAR models for predicting the biodegradability of compounds. This will enable more efficient and effective assessments of the potential impact of chemicals on the environment, facilitating the development of more sustainable and environmentally friendly products.