Open-Source Software Development in Cheminformatics: A Qualitative Analysis of Rationales
Why discuss this paper?
I chose the Open-Source Software Development in Cheminformatics: A Qualitative Analysis of Rationales paper (Pernaa et al. 2023) for current topics in the cheminformatics seminar because:
- Introducing the concept of “qualitative content analysis” which might be unfamiliar to our group’s regular work, along with the used software ATLAS.ti 9.
- Discuss Open-Source in Chemoinformatics due to its relevance in our work in NFDI4Chem.
- The covered discussion over business, and business politics which might be of interest to us at this level of our careers.
Context
The paper demonstrates and promotes the different rationales of open-source in cheminformatics by extracting them from different sources and categorizing them for rationales designing in future software development projects. Additionally, the authors transparently cover the arguments against this approach.
Cheminformatics in chemistry: Brief history
Cheminformatics has been used in chemistry since the 1940s (Chen 2006). King et al. (King, Cross, and Thomas 1946) may be the first scholars who applied computers in chemistry research. However, Ray and Kirsch (Ray and Kirsch 1957) described an algorithm for substructure searching in a paper published in 1957, which might be considered the first actual cheminformatics paper.
Although cheminformatics as a field has been around for many decades, much of the cheminformatics research has been conducted in industrial laboratories, not academia. Because of this, many applications and methods are not published due to intellectual-property-rights issues (Willett 2011).
Problem setting
- The development of the field requires open-source technology
- Open-source software development in cheminformatics is necessary for cheminformatics education.
- Open source and open data enable reproducibility which supports scientific reliability.
- Open-source code lowers the research costs significantly.
Approach
The authors considered research articles with “open source”-related concepts that were then analyzed using qualitative content analysis and ATLAS.ti 9 software to extract relevant expressions and generate rationales subcategories, which were then classified into main categories.
Original Expression | Sub-Category | Main Category |
---|---|---|
“Despite these efforts, no general purpose deterministic structure generator has been developed in an open source format so far.” (Peironcely et al. 2012) | No available open-source alternative | Develop New Software |
“The ChemoPy package aims at providing the user with comprehensive implementations of these descriptors in a unified framework to allow easy and transparent computation.” (Cao et al. 2013) | Clear workflow | Improve Usability |
Results
The analysis produced six main rationale categories for open-source cheminformatics software development. The perspective of the rationale can be either general or specific. Most rationales produce technological outcomes, such as new software, frameworks, interfaces, and processes.
# | Rationale | Perspective | Outcome |
---|---|---|---|
1 | Develop New Software | General/Specific | Technological |
2 | Update Current Features, Tools, or Processes | Specific | Technological |
3 | Improve Usability | General | Technological |
4 | Support Open-source Development and Open Science | General/Specific | Technological/Political |
5 | Fulfill Chemical Information Needs | Specific | Content-driven |
6 | Support Chemistry Learning and Teaching | General/Specific | Pedagogical |
Take aways
- The most central challenge hindering the academic development of cheminformatics has been the industrial background.
- Open source development can bridge academic and industrial stakeholders.