Why did I choose this paper?
- It is a natural product database that aligns with our interests and is now integrated into COCONUT 2.0.
- The data in this database was mined from literature and using LitSuggest reviews and low-scoring articles were discarded (threshold <0.6), which also involved fine-tuning the model for optimal performance.
- This is one of the newest databases that provides extensive information about molecules, including their source organisms, a key feature highly relevant to COCONUT.
- Endophytic microorganisms, residing within plant tissues, play a vital role in biotic and abiotic stress responses by producing diverse natural products (NPs). Approximately 49.5% of FDA-approved drugs are derived from NPs or their derivatives, making endophytes a promising source for novel bioactive compounds. Despite this, existing databases lack a comprehensive focus on endophytes. The newly developed EMNPD database addresses this gap, providing open access to curated data on endophytic microorganism natural products and their bioactivities.
π Key Statistics
- Data: Includes physicochemical properties, ADMET information, and fermentation conditions
- Accessibility: Open-access, registration-free database at EMNPD website
π Comparative Summary of Microbial NP Databases π
π EMNPD |
Endophyte NPs |
6,632 |
1,017 |
β |
β |
β |
MyxoDB |
Myxobacterial |
674 |
β |
β |
β |
β |
mVOC |
Microbial volatiles |
2,061 |
1,034 |
β |
β |
β |
NPcVar |
Plant & microbial |
2,201 |
694 |
β |
β |
β |
StreptomeDB |
Streptomycetes |
6,524 |
3,302 |
β |
β |
β |
CMNPD |
Marine |
31,561 |
3,354 |
β |
β |
β |
NPAtlas |
Microbial |
33,372 |
β |
β |
β |
β |
NPASS |
Plant & microbial |
96,481 |
32,287 |
β |
β |
β |
Construction and Content
Data Extraction and Curation
- Sources: Data gathered from PubMed, keyword searches, and computational tools.
- Filtration Process:
- Initial collection: 2600 articles.
- Removal of reviews and low-scoring articles using LitSuggest (threshold <0.6).
- Final curation: 1000 articles selected.
Data Collection and Processing
- Compound Characterization:
- Structures sourced using PubChemPy and manually drawn using tools like ChemDraw and KingDraw .
- Classification was performed using ClassyFire and ADMET predictions with ADMETlab 2.0.
- Taxonomic Information:
- Endophyte taxonomy derived from NCBI Taxonomy.
- Host plant and geographic data integrated for added context.
- Biological Activity Data:
- Categorized into antibacterial, cytotoxic, anti-inflammatory, etc.
- Detailed bioactivity records linked to targets, including proteins, cell lines, and organisms.
Database Content and Statistics
- Chemical Diversity:
- 21 chemical superclasses represented.
- Majority of compounds adhere to Lipinskiβs βRule of Five.β
- Taxonomic Diversity:
pie title Distribution
"Fungi" : 87.5
"Bacteria" : 12.5
Utility and Discussion
Web Interface
- Search Capabilities:
- Advanced search with Boolean operators for NPs, targets, and bioactivities.
- Structure search enabled through Ketcher molecular editor.
- Browsing Features:
- Visual tools like bar, tree, and sunburst charts for exploring data.
Downloads
- All data is available for free download, including Docker support for local deployment.
- Git: https://github.com/boilism/EMNPD
- Data: https://figshare.com/articles/dataset/EMNPD_Download_Data/24078474
Conclusion
Endophytic microorganisms are a rich source of novel secondary metabolites, known for their diverse structures and significant biological activities, particularly in antimicrobial and anticancer research. Despite the frequent discovery of new and active natural products (NPs) from these microorganisms, the integration of this data into large-scale databases is often slow, highlighting the importance of efficient information sharing for research and development.
To address this EMNPD was developed. This database provides extensive data and interactive visualisation tools to aid in exploring the chemical diversity of these NPs. It is fully searchable and downloadable, designed to support various research perspectives. As interest in endophytic microorganisms continues to grow, EMNPD aims to become an essential resource for advancing drug discovery.
Guo, Jeff, and Philippe Schwaller. 2024.
βIt Takes Two to Tango: Directly Optimizing for Constrained Synthesizability in Generative Molecular Design.β arXiv.
https://doi.org/10.48550/ARXIV.2410.11527.
Landrum, Gregory A., Jessica Braun, Paul Katzberger, Marc T. Lehner, and Sereina Riniker. 2024.
βLwreg: A Lightweight System for Chemical Registration and Data Storage.β Journal of Chemical Information and Modeling 64 (16): 6247β52.
https://doi.org/10.1021/acs.jcim.4c01133.
Orsi, Markus, and Jean-Louis Reymond. 2024. βOne Chiral Fingerprint to Find Them All.β Journal of Cheminformatics 16 (1): 53.