API Reference¶
datasets
¶
AbstractDataset
¶
Bases: ABC
Base class for polymer datasets.
Source code in src/polymetrix/datasets/dataset.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
available_features
property
¶
List of available features. Returns: list[str]: List of feature names
available_labels
property
¶
List of available labels. Returns: list[str]: List of label names
meta_info
property
¶
List of available metadata fields. Returns: list[str]: List of metadata field names
psmiles
property
¶
Return the polymer SMILES strings. Returns: np.ndarray: Array of polymer SMILES strings
__init__()
¶
Initialize a dataset.
__iter__()
¶
__len__()
¶
get_features(idx, feature_names=None)
¶
Get features for specified indices. Args: idx (Collection[int]): Indices of entries. feature_names (Optional[Collection[str]]): Names of features to return. If None, returns all available features. Returns: np.ndarray: Array of feature values.
Source code in src/polymetrix/datasets/dataset.py
get_labels(idx, label_names=None)
¶
Get labels for specified indices. Args: idx (Collection[int]): Indices of entries. label_names (Optional[Collection[str]]): Names of labels to return. If None, returns all available labels. Returns: np.ndarray: Array of label values.
Source code in src/polymetrix/datasets/dataset.py
get_meta(idx, meta_keys=None)
¶
Get metadata for specified indices. Args: idx (Collection[int]): Indices of entries. meta_keys (Optional[Collection[str]]): Names of metadata fields to return. If None, returns all available metadata.
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Array of metadata values. |
Source code in src/polymetrix/datasets/dataset.py
get_subset(indices)
¶
Get a subset of the dataset.
Source code in src/polymetrix/datasets/dataset.py
CuratedGlassTempDataset
¶
Bases: AbstractDataset
Dataset for polymer glass transition temperature (Tg) data.
Source code in src/polymetrix/datasets/curated_tg_dataset.py
__init__(feature_levels=ALL_FEATURE_LEVELS, subset=None)
¶
Initialize the Tg dataset. Args: feature_levels (List[str]): Feature levels to include subset (Optional[Collection[int]]): Indices to include in the dataset
Source code in src/polymetrix/datasets/curated_tg_dataset.py
curated_tg_dataset
¶
CuratedGlassTempDataset
¶
Bases: AbstractDataset
Dataset for polymer glass transition temperature (Tg) data.
Source code in src/polymetrix/datasets/curated_tg_dataset.py
__init__(feature_levels=ALL_FEATURE_LEVELS, subset=None)
¶
Initialize the Tg dataset. Args: feature_levels (List[str]): Feature levels to include subset (Optional[Collection[int]]): Indices to include in the dataset
Source code in src/polymetrix/datasets/curated_tg_dataset.py
dataset
¶
AbstractDataset
¶
Bases: ABC
Base class for polymer datasets.
Source code in src/polymetrix/datasets/dataset.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
available_features
property
¶
List of available features. Returns: list[str]: List of feature names
available_labels
property
¶
List of available labels. Returns: list[str]: List of label names
meta_info
property
¶
List of available metadata fields. Returns: list[str]: List of metadata field names
psmiles
property
¶
Return the polymer SMILES strings. Returns: np.ndarray: Array of polymer SMILES strings
__init__()
¶
Initialize a dataset.
__iter__()
¶
__len__()
¶
get_features(idx, feature_names=None)
¶
Get features for specified indices. Args: idx (Collection[int]): Indices of entries. feature_names (Optional[Collection[str]]): Names of features to return. If None, returns all available features. Returns: np.ndarray: Array of feature values.
Source code in src/polymetrix/datasets/dataset.py
get_labels(idx, label_names=None)
¶
Get labels for specified indices. Args: idx (Collection[int]): Indices of entries. label_names (Optional[Collection[str]]): Names of labels to return. If None, returns all available labels. Returns: np.ndarray: Array of label values.
Source code in src/polymetrix/datasets/dataset.py
get_meta(idx, meta_keys=None)
¶
Get metadata for specified indices. Args: idx (Collection[int]): Indices of entries. meta_keys (Optional[Collection[str]]): Names of metadata fields to return. If None, returns all available metadata.
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Array of metadata values. |
Source code in src/polymetrix/datasets/dataset.py
get_subset(indices)
¶
Get a subset of the dataset.
Source code in src/polymetrix/datasets/dataset.py
featurizers
¶
base_featurizer
¶
BaseFeatureCalculator
¶
Source code in src/polymetrix/featurizers/base_featurizer.py
aggregate(features)
¶
Aggregates a list of features using the aggregation functions specified in self.agg. If the features are numpy arrays, the aggregation is applied along the first axis. Otherwise, the aggregation is applied directly (assuming the features are scalar numeric values).
Source code in src/polymetrix/featurizers/base_featurizer.py
MoleculeFeaturizer
¶
Base class for featurizers that work with general molecules.
Source code in src/polymetrix/featurizers/base_featurizer.py
chemical_featurizer
¶
BalabanJIndex
¶
Bases: GenericScalarFeaturizer
Measures molecular complexity and connectivity of atoms.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
BondCounts
¶
Bases: BaseFeatureCalculator
Counts the number of single, double, and triple bonds in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
BridgingRingsCount
¶
Bases: BaseFeatureCalculator
Counts the number of bridging rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
FpDensityMorgan1
¶
Bases: GenericScalarFeaturizer
Calculates the density of the Morgan1 fingerprint.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
FractionBicyclicRings
¶
Bases: BaseFeatureCalculator
Calculates the fraction of bicyclic rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
HalogenCounts
¶
Bases: BaseFeatureCalculator
Counts the number of halogen atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
HeteroatomCount
¶
Bases: BaseFeatureCalculator
Counts heteroatoms (non-C, non-H) in heterocyclic rings.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
HeteroatomDensity
¶
Bases: BaseFeatureCalculator
Density of heteroatoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
MaxEStateIndex
¶
Bases: GenericScalarFeaturizer
Maximum electronic state index, reflecting charge distribution.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
MaxRingSize
¶
Bases: BaseFeatureCalculator
Calculates the size of the largest ring in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
MolecularWeight
¶
Bases: GenericScalarFeaturizer
Calculates the molecular weight of the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumAliphaticHeterocycles
¶
Bases: BaseFeatureCalculator
Counts the number of aliphatic heterocycles in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumAromaticRings
¶
Bases: BaseFeatureCalculator
Counts the number of aromatic rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumAtoms
¶
Bases: BaseFeatureCalculator
Counts the number of atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumHBondAcceptors
¶
Bases: GenericScalarFeaturizer
Counts Number of hydrogen bond acceptors.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumHBondDonors
¶
Bases: GenericScalarFeaturizer
Counts Number of hydrogen bond donors.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumNonAromaticRings
¶
Bases: BaseFeatureCalculator
Counts the number of non-aromatic rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumRings
¶
Bases: BaseFeatureCalculator
Counts the number of rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumRotatableBonds
¶
Bases: GenericScalarFeaturizer
Counts Number of rotatable bonds.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
SlogPVSA1
¶
Bases: GenericScalarFeaturizer
Calculates the Surface area contributing to octanol solubility, linked to lipophilicity.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
SmrVSA5
¶
Bases: GenericScalarFeaturizer
Molar refractivity sum for atoms with specific surface area (2.45–2.75).
Source code in src/polymetrix/featurizers/chemical_featurizer.py
Sp2CarbonCountFeaturizer
¶
Bases: BaseFeatureCalculator
Counts the number of sp2 hybridized carbon atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
Sp3CarbonCountFeaturizer
¶
Bases: BaseFeatureCalculator
Counts the number of sp3 hybridized carbon atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
TopologicalSurfaceArea
¶
Bases: GenericScalarFeaturizer
Calculates the topological polar surface area.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
comparator
¶
PolymerMoleculeComparator
¶
Comparator that computes various comparison metrics between polymer and molecule features.
Source code in src/polymetrix/featurizers/comparator.py
aggregate(features)
¶
Aggregate features across comparison methods.
Source code in src/polymetrix/featurizers/comparator.py
compare(polymer, molecule)
¶
Return comparison metrics between polymer and molecule features.
Source code in src/polymetrix/featurizers/comparator.py
feature_labels()
¶
Generate labels for comparison and aggregated features.
Source code in src/polymetrix/featurizers/comparator.py
molecule
¶
FullMolecularFeaturizer
¶
Bases: MoleculeFeaturizer
Featurizer for general molecules.
This class can featurize any molecule from a Molecule object that contains a SMILES string and RDKit molecule object.
Source code in src/polymetrix/featurizers/molecule.py
featurize(molecule)
¶
Featurize a molecule object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
molecule
|
A Molecule object with a mol property containing an RDKit molecule. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Feature vector calculated by the underlying calculator. |
Source code in src/polymetrix/featurizers/molecule.py
Molecule
¶
A class to represent a general molecule from SMILES string.
Attributes:
Name | Type | Description |
---|---|---|
smiles |
Optional[str]
|
Optional[str], the SMILES string representing the molecule. |
mol |
Optional[Mol]
|
Optional[Chem.Mol], the RDKit molecule object. |
Raises:
Type | Description |
---|---|
ValueError
|
If the provided SMILES string is invalid or cannot be processed. |
Source code in src/polymetrix/featurizers/molecule.py
mol
property
¶
Gets the RDKit molecule object.
Returns:
Type | Description |
---|---|
Optional[Mol]
|
Optional[Chem.Mol]: The RDKit molecule object, or None if not set. |
smiles
property
writable
¶
Gets the SMILES string of the molecule.
Returns:
Type | Description |
---|---|
Optional[str]
|
Optional[str]: The SMILES string, or None if not set. |
calculate_molecular_weight()
¶
Calculates the exact molecular weight of the molecule.
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The molecular weight of the molecule. |
Source code in src/polymetrix/featurizers/molecule.py
from_smiles(smiles)
classmethod
¶
Creates a Molecule instance from a SMILES string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles
|
str
|
str, the SMILES string representing the molecule. |
required |
Returns:
Name | Type | Description |
---|---|---|
Molecule |
Molecule
|
A new Molecule object initialized with the given SMILES string. |
Raises:
Type | Description |
---|---|
ValueError
|
If the SMILES string is invalid. |
Source code in src/polymetrix/featurizers/molecule.py
multiple_featurizer
¶
MultipleFeaturizer
¶
Source code in src/polymetrix/featurizers/multiple_featurizer.py
feature_labels()
¶
Return feature labels with '_with_terminalgroups' suffix when applicable.
Source code in src/polymetrix/featurizers/multiple_featurizer.py
polymer
¶
Polymer
¶
Represents a polymer molecule with its backbone and sidechain information.
Attributes:
Name | Type | Description |
---|---|---|
psmiles |
Optional[str]
|
Optional[str], the pSMILES string of the polymer. |
backbone_terminal_groups |
Optional[Dict[str, str]]
|
Optional[Dict[str, str]], maps connection point patterns to backbone terminal group SMILES. |
sidechain_terminal_groups |
Optional[Dict[str, str]]
|
Optional[Dict[str, str]], maps connection point patterns to sidechain terminal group SMILES. |
graph |
Optional[nx.Graph], the NetworkX graph of the polymer structure. |
|
backbone_nodes |
Optional[List[int]], node indices forming the backbone. |
|
sidechain_nodes |
Optional[List[int]], node indices forming the sidechains. |
|
connection_points |
Optional[List[int]], node indices of connection points. |
|
_mol |
Optional[Chem.Mol], the RDKit molecule object (internal use). |
Source code in src/polymetrix/featurizers/polymer.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 |
|
backbone_molecule
property
¶
Gets the backbone molecule.
backbone_terminal_groups
property
writable
¶
Maps connection point patterns to backbone terminal group SMILES.
full_polymer_mol
property
¶
Gets the full polymer molecule.
mol
property
¶
Returns the full polymer molecule, compatible with featurizers expecting a 'mol' attribute.
psmiles
property
writable
¶
The pSMILES string of the polymer.
sidechain_molecules
property
¶
Gets the sidechain molecules.
sidechain_terminal_groups
property
writable
¶
Maps connection point patterns to sidechain terminal group SMILES.
calculate_molecular_weight()
¶
Calculates the exact molecular weight of the polymer.
Returns:
Type | Description |
---|---|
float
|
The molecular weight of the polymer. |
from_psmiles(psmiles)
classmethod
¶
Creates a Polymer instance from a pSMILES string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
psmiles
|
str
|
The pSMILES string representing the polymer. |
required |
Returns:
Type | Description |
---|---|
Polymer
|
A new Polymer instance. |
Raises:
Type | Description |
---|---|
ValueError
|
If the pSMILES string is invalid. |
Source code in src/polymetrix/featurizers/polymer.py
get_backbone_and_sidechain_graphs()
¶
Extracts NetworkX graphs for the backbone and sidechains.
Returns:
Type | Description |
---|---|
Tuple[Graph, List[Graph]]
|
A tuple of (backbone graph, list of sidechain graphs). |
Source code in src/polymetrix/featurizers/polymer.py
get_backbone_and_sidechain_molecules()
¶
Extracts RDKit molecules for the backbone and sidechains.
Returns:
Type | Description |
---|---|
Tuple[List[Mol], List[Mol]]
|
A tuple of (list of backbone molecules, list of sidechain molecules). |
Source code in src/polymetrix/featurizers/polymer.py
get_connection_points()
¶
Gets the connection point node indices.
Returns:
Type | Description |
---|---|
List[int]
|
List of node indices representing connection points. |
add_degree_one_nodes_to_backbone(graph, backbone)
¶
Adds degree-1 nodes connected to backbone nodes to the backbone list, avoiding duplicates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
Graph
|
The input graph to analyze. |
required |
backbone
|
List[int]
|
Initial list of backbone node indices. |
required |
Returns:
Type | Description |
---|---|
List[int]
|
Updated backbone list including degree-1 nodes, with no duplicates. |
Source code in src/polymetrix/featurizers/polymer.py
attach_terminal_to_atom(mol, target_idx, terminal_mol, attachment_idx=None)
¶
Attaches a terminal group to a specific atom in the molecule.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mol
|
RWMol
|
The molecule being modified. |
required |
target_idx
|
int
|
Index of the target atom to attach the terminal group. |
required |
terminal_mol
|
Mol
|
The terminal group molecule. |
required |
attachment_idx
|
int
|
Index of the attachment point in the terminal group (optional for sidechains). |
None
|
Returns:
Type | Description |
---|---|
RWMol
|
The modified molecule. |
Source code in src/polymetrix/featurizers/polymer.py
classify_backbone_and_sidechains(graph)
¶
Classifies nodes into backbone and sidechain components based on paths and cycles.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
Graph
|
The input graph to classify. |
required |
Returns:
Type | Description |
---|---|
Tuple[List[int], List[int]]
|
A tuple of (backbone nodes, sidechain nodes). |
Source code in src/polymetrix/featurizers/polymer.py
find_cycles_including_paths(graph, paths)
¶
Identifies cycles that include nodes from the given paths.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
Graph
|
The input graph to analyze. |
required |
paths
|
List[List[int]]
|
List of paths whose nodes are used to filter cycles. |
required |
Returns:
Type | Description |
---|---|
List[List[int]]
|
List of cycles, where each cycle is a list of node indices. |
Source code in src/polymetrix/featurizers/polymer.py
find_shortest_paths_between_stars(graph)
¶
Finds shortest paths between all pairs of asterisk (*) nodes in the graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
Graph
|
The input graph to analyze. |
required |
Returns:
Type | Description |
---|---|
List[List[int]]
|
List of shortest paths, where each path is a list of node indices. |
Source code in src/polymetrix/featurizers/polymer.py
insert_terminal_group(mol, terminal_groups, is_sidechain=False)
¶
Inserts terminal groups into a molecule by replacing connection points or attaching to sidechains.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mol
|
Mol
|
The RDKit molecule to modify. |
required |
terminal_groups
|
Dict[str, str]
|
Dictionary mapping patterns to terminal group SMILES. |
required |
is_sidechain
|
bool
|
If True, attach terminal groups to sidechains; else, replace backbone connection points. |
False
|
Returns:
Type | Description |
---|---|
Mol
|
A new RDKit molecule with terminal groups inserted. |
Source code in src/polymetrix/featurizers/polymer.py
replace_asterisk_with_terminal(mol, asterisk_idx, terminal_mol, attachment_idx)
¶
Replaces a single asterisk atom with a terminal group.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mol
|
RWMol
|
The molecule being modified. |
required |
asterisk_idx
|
int
|
Index of the asterisk atom to replace. |
required |
terminal_mol
|
Mol
|
The terminal group molecule. |
required |
attachment_idx
|
int
|
Index of the attachment point in the terminal group. |
required |
Returns:
Type | Description |
---|---|
RWMol
|
The modified molecule. |
Source code in src/polymetrix/featurizers/polymer.py
sidechain_backbone_featurizer
¶
SidechainDiversityFeaturizer
¶
Bases: BaseFeatureCalculator
Computes the number of structurally diverse sidechains in a polymer based on graph isomorphism.
Source code in src/polymetrix/featurizers/sidechain_backbone_featurizer.py
SidechainLengthToStarAttachmentDistanceRatioFeaturizer
¶
Bases: BaseFeatureCalculator
Computes aggregated ratios of sidechain lengths to the shortest backbone distance from the polymer's star node (*) to each sidechain's attachment point.
Source code in src/polymetrix/featurizers/sidechain_backbone_featurizer.py
StarToSidechainMinDistanceFeaturizer
¶
Bases: BaseFeatureCalculator
Computes aggregated minimum backbone distances from star nodes (*) to sidechains in a polymer.
Source code in src/polymetrix/featurizers/sidechain_backbone_featurizer.py
splitters
¶
splitters
¶
PolymerClassSplitter
¶
Bases: BaseSplitter
Splitter based on polymer class
Source code in src/polymetrix/splitters/splitters.py
TgSplitter
¶
Bases: BaseSplitter
Splitter based on Tg values
Source code in src/polymetrix/splitters/splitters.py
__init__(ds, tg_q=None, label_name='labels.Exp_Tg(K)', shuffle=True, random_state=None, **kwargs)
¶
Initialize TgSplitter
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ds
|
AbstractDataset
|
Dataset to split |
required |
tg_q
|
Optional[Collection[float]]
|
Quantiles to bin Tg values into groups |
None
|
label_name
|
str
|
Name of the label to use for splitting |
'labels.Exp_Tg(K)'
|
shuffle
|
bool
|
Whether to shuffle the dataset |
True
|
random_state
|
Optional[Union[int, RandomState]]
|
Random state for shuffling |
None
|
**kwargs
|
Additional arguments to pass to BaseSplitter |
{}
|