Open-Source Software Development in Cheminformatics: A Qualitative Analysis of Rationales

Author

Noura Rayya

Published

June 26, 2024

Why discuss this paper?

I chose the Open-Source Software Development in Cheminformatics: A Qualitative Analysis of Rationales paper (Pernaa et al. 2023) for current topics in the cheminformatics seminar because:

  • Introducing the concept of “qualitative content analysis” which might be unfamiliar to our group’s regular work, along with the used software ATLAS.ti 9.
  • Discuss Open-Source in Chemoinformatics due to its relevance in our work in NFDI4Chem.
  • The covered discussion over business, and business politics which might be of interest to us at this level of our careers.

Context

The paper demonstrates and promotes the different rationales of open-source in cheminformatics by extracting them from different sources and categorizing them for rationales designing in future software development projects. Additionally, the authors transparently cover the arguments against this approach.

Cheminformatics in chemistry: Brief history

Cheminformatics has been used in chemistry since the 1940s (Chen 2006). King et al. (King, Cross, and Thomas 1946) may be the first scholars who applied computers in chemistry research. However, Ray and Kirsch (Ray and Kirsch 1957) described an algorithm for substructure searching in a paper published in 1957, which might be considered the first actual cheminformatics paper.

Although cheminformatics as a field has been around for many decades, much of the cheminformatics research has been conducted in industrial laboratories, not academia. Because of this, many applications and methods are not published due to intellectual-property-rights issues (Willett 2011).

Problem setting

  • The development of the field requires open-source technology
  • Open-source software development in cheminformatics is necessary for cheminformatics education.
  • Open source and open data enable reproducibility which supports scientific reliability.
  • Open-source code lowers the research costs significantly.

Approach

The authors considered research articles with “open source”-related concepts that were then analyzed using qualitative content analysis and ATLAS.ti 9 software to extract relevant expressions and generate rationales subcategories, which were then classified into main categories.

Original Expression Sub-Category Main Category
“Despite these efforts, no general purpose deterministic structure generator has been developed in an open source format so far.” (Peironcely et al. 2012) No available open-source alternative Develop New Software
“The ChemoPy package aims at providing the user with comprehensive implementations of these descriptors in a unified framework to allow easy and transparent computation.” (Cao et al. 2013) Clear workflow Improve Usability

Results

The analysis produced six main rationale categories for open-source cheminformatics software development. The perspective of the rationale can be either general or specific. Most rationales produce technological outcomes, such as new software, frameworks, interfaces, and processes.

# Rationale Perspective Outcome
1 Develop New Software General/Specific Technological
2 Update Current Features, Tools, or Processes Specific Technological
3 Improve Usability General Technological
4 Support Open-source Development and Open Science General/Specific Technological/Political
5 Fulfill Chemical Information Needs Specific Content-driven
6 Support Chemistry Learning and Teaching General/Specific Pedagogical

Take aways

  • The most central challenge hindering the academic development of cheminformatics has been the industrial background.
  • Open source development can bridge academic and industrial stakeholders.

References

Cao, Dong-Sheng, Qing-Song Xu, Qian-Nan Hu, and Yi-Zeng Liang. 2013. “ChemoPy: Freely Available Python Package for Computational Biology and Chemoinformatics.” Bioinformatics 29 (8): 1092–94.
Chen, William Lingran. 2006. “Chemoinformatics: Past, Present, and Future.” Journal of Chemical Information and Modeling 46 (6): 2230–55.
Guo, Jeff, and Philippe Schwaller. 2024. “It Takes Two to Tango: Directly Optimizing for Constrained Synthesizability in Generative Molecular Design.” arXiv. https://doi.org/10.48550/ARXIV.2410.11527.
King, Gilbert W, Paul C Cross, and George B Thomas. 1946. “The Asymmetric Rotor III. Punched-Card Methods of Constructing Band Spectra.” The Journal of Chemical Physics 14 (1): 35–42.
Landrum, Gregory A., Jessica Braun, Paul Katzberger, Marc T. Lehner, and Sereina Riniker. 2024. “Lwreg: A Lightweight System for Chemical Registration and Data Storage.” Journal of Chemical Information and Modeling 64 (16): 6247–52. https://doi.org/10.1021/acs.jcim.4c01133.
Orsi, Markus, and Jean-Louis Reymond. 2024. “One Chiral Fingerprint to Find Them All.” Journal of Cheminformatics 16 (1): 53.
Peironcely, Julio E, Miguel Rojas-Chertó, Davide Fichera, Theo Reijmers, Leon Coulier, Jean-Loup Faulon, and Thomas Hankemeier. 2012. “OMG: Open Molecule Generator.” Journal of Cheminformatics 4: 1–13.
Pernaa, Johannes, Aleksi Takala, Veysel Ciftci, José Hernández-Ramos, Lizethly Cáceres-Jensen, and Jorge Rodrı́guez-Becerra. 2023. “Open-Source Software Development in Cheminformatics: A Qualitative Analysis of Rationales.” Applied Sciences 13 (17): 9516.
Ray, Louis C, and Russell A Kirsch. 1957. “Finding Chemical Records by Digital Computers.” Science 126 (3278): 814–19.
Willett, Peter. 2011. “Chemoinformatics: A History.” Wiley Interdisciplinary Reviews: Computational Molecular Science 1 (1): 46–56.