API Reference¶
datasets
¶
AbstractDataset
¶
Bases: ABC
Base class for polymer datasets.
Source code in src/polymetrix/datasets/dataset.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
available_features
property
¶
List of available features. Returns: list[str]: List of feature names
available_labels
property
¶
List of available labels. Returns: list[str]: List of label names
meta_info
property
¶
List of available metadata fields. Returns: list[str]: List of metadata field names
psmiles
property
¶
Return the polymer SMILES strings. Returns: np.ndarray: Array of polymer SMILES strings
__init__()
¶
Initialize a dataset.
__iter__()
¶
__len__()
¶
get_features(idx, feature_names=None)
¶
Get features for specified indices. Args: idx (Collection[int]): Indices of entries. feature_names (Optional[Collection[str]]): Names of features to return. If None, returns all available features. Returns: np.ndarray: Array of feature values.
Source code in src/polymetrix/datasets/dataset.py
get_labels(idx, label_names=None)
¶
Get labels for specified indices. Args: idx (Collection[int]): Indices of entries. label_names (Optional[Collection[str]]): Names of labels to return. If None, returns all available labels. Returns: np.ndarray: Array of label values.
Source code in src/polymetrix/datasets/dataset.py
get_meta(idx, meta_keys=None)
¶
Get metadata for specified indices. Args: idx (Collection[int]): Indices of entries. meta_keys (Optional[Collection[str]]): Names of metadata fields to return. If None, returns all available metadata.
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Array of metadata values. |
Source code in src/polymetrix/datasets/dataset.py
get_subset(indices)
¶
Get a subset of the dataset.
Source code in src/polymetrix/datasets/dataset.py
CuratedGlassTempDataset
¶
Bases: AbstractDataset
Dataset for polymer glass transition temperature (Tg) data.
Source code in src/polymetrix/datasets/curated_tg_dataset.py
__init__(feature_levels=ALL_FEATURE_LEVELS, subset=None)
¶
Initialize the Tg dataset. Args: feature_levels (List[str]): Feature levels to include subset (Optional[Collection[int]]): Indices to include in the dataset
Source code in src/polymetrix/datasets/curated_tg_dataset.py
curated_tg_dataset
¶
CuratedGlassTempDataset
¶
Bases: AbstractDataset
Dataset for polymer glass transition temperature (Tg) data.
Source code in src/polymetrix/datasets/curated_tg_dataset.py
__init__(feature_levels=ALL_FEATURE_LEVELS, subset=None)
¶
Initialize the Tg dataset. Args: feature_levels (List[str]): Feature levels to include subset (Optional[Collection[int]]): Indices to include in the dataset
Source code in src/polymetrix/datasets/curated_tg_dataset.py
dataset
¶
AbstractDataset
¶
Bases: ABC
Base class for polymer datasets.
Source code in src/polymetrix/datasets/dataset.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
available_features
property
¶
List of available features. Returns: list[str]: List of feature names
available_labels
property
¶
List of available labels. Returns: list[str]: List of label names
meta_info
property
¶
List of available metadata fields. Returns: list[str]: List of metadata field names
psmiles
property
¶
Return the polymer SMILES strings. Returns: np.ndarray: Array of polymer SMILES strings
__init__()
¶
Initialize a dataset.
__iter__()
¶
__len__()
¶
get_features(idx, feature_names=None)
¶
Get features for specified indices. Args: idx (Collection[int]): Indices of entries. feature_names (Optional[Collection[str]]): Names of features to return. If None, returns all available features. Returns: np.ndarray: Array of feature values.
Source code in src/polymetrix/datasets/dataset.py
get_labels(idx, label_names=None)
¶
Get labels for specified indices. Args: idx (Collection[int]): Indices of entries. label_names (Optional[Collection[str]]): Names of labels to return. If None, returns all available labels. Returns: np.ndarray: Array of label values.
Source code in src/polymetrix/datasets/dataset.py
get_meta(idx, meta_keys=None)
¶
Get metadata for specified indices. Args: idx (Collection[int]): Indices of entries. meta_keys (Optional[Collection[str]]): Names of metadata fields to return. If None, returns all available metadata.
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Array of metadata values. |
Source code in src/polymetrix/datasets/dataset.py
get_subset(indices)
¶
Get a subset of the dataset.
Source code in src/polymetrix/datasets/dataset.py
featurizers
¶
base_featurizer
¶
BaseFeatureCalculator
¶
Source code in src/polymetrix/featurizers/base_featurizer.py
aggregate(features)
¶
Aggregates a list of features using the aggregation functions specified in self.agg. If the features are numpy arrays, the aggregation is applied along the first axis. Otherwise, the aggregation is applied directly (assuming the features are scalar numeric values).
Source code in src/polymetrix/featurizers/base_featurizer.py
chemical_featurizer
¶
BalabanJIndex
¶
Bases: GenericScalarFeaturizer
Measures molecular complexity and connectivity of atoms.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
BondCounts
¶
Bases: BaseFeatureCalculator
Counts the number of single, double, and triple bonds in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
BridgingRingsCount
¶
Bases: BaseFeatureCalculator
Counts the number of bridging rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
FpDensityMorgan1
¶
Bases: GenericScalarFeaturizer
Calculates the density of the Morgan1 fingerprint.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
FractionBicyclicRings
¶
Bases: BaseFeatureCalculator
Calculates the fraction of bicyclic rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
HalogenCounts
¶
Bases: BaseFeatureCalculator
Counts the number of halogen atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
HeteroatomCount
¶
Bases: BaseFeatureCalculator
Counts heteroatoms (non-C, non-H) in heterocyclic rings.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
HeteroatomDensity
¶
Bases: BaseFeatureCalculator
Density of heteroatoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
MaxEStateIndex
¶
Bases: GenericScalarFeaturizer
Maximum electronic state index, reflecting charge distribution.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
MaxRingSize
¶
Bases: BaseFeatureCalculator
Calculates the size of the largest ring in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
MolecularWeight
¶
Bases: GenericScalarFeaturizer
Calculates the molecular weight of the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumAliphaticHeterocycles
¶
Bases: BaseFeatureCalculator
Counts the number of aliphatic heterocycles in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumAromaticRings
¶
Bases: BaseFeatureCalculator
Counts the number of aromatic rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumAtoms
¶
Bases: BaseFeatureCalculator
Counts the number of atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumHBondAcceptors
¶
Bases: GenericScalarFeaturizer
Counts Number of hydrogen bond acceptors.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumHBondDonors
¶
Bases: GenericScalarFeaturizer
Counts Number of hydrogen bond donors.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumNonAromaticRings
¶
Bases: BaseFeatureCalculator
Counts the number of non-aromatic rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumRings
¶
Bases: BaseFeatureCalculator
Counts the number of rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumRotatableBonds
¶
Bases: GenericScalarFeaturizer
Counts Number of rotatable bonds.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
SlogPVSA1
¶
Bases: GenericScalarFeaturizer
Calculates the Surface area contributing to octanol solubility, linked to lipophilicity.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
SmrVSA5
¶
Bases: GenericScalarFeaturizer
Molar refractivity sum for atoms with specific surface area (2.45–2.75).
Source code in src/polymetrix/featurizers/chemical_featurizer.py
Sp2CarbonCountFeaturizer
¶
Bases: BaseFeatureCalculator
Counts the number of sp2 hybridized carbon atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
Sp3CarbonCountFeaturizer
¶
Bases: BaseFeatureCalculator
Counts the number of sp3 hybridized carbon atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
TopologicalSurfaceArea
¶
Bases: GenericScalarFeaturizer
Calculates the topological polar surface area.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
polymer
¶
Polymer
¶
A class to represent a polymer molecule and extract its backbone and sidechain information.
Attributes:
Name | Type | Description |
---|---|---|
psmiles |
Optional[str]
|
Optional[str], the pSMILES string representing the polymer molecule. |
graph |
Graph
|
Optional[nx.Graph], a NetworkX graph representing the polymer structure. |
backbone_nodes |
List[int]
|
Optional[List[int]], list of node indices forming the polymer backbone. |
sidechain_nodes |
List[int]
|
Optional[List[int]], list of node indices forming the sidechains. |
connection_points |
List[int]
|
Optional[List[int]], list of node indices representing connection points. |
Raises:
Type | Description |
---|---|
ValueError
|
If the provided pSMILES string is invalid or cannot be processed. |
Source code in src/polymetrix/featurizers/polymer.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
|
backbone_nodes
property
¶
Gets the list of backbone node indices.
Returns:
Type | Description |
---|---|
List[int]
|
List[int]: List of node indices representing the backbone. |
graph
property
¶
Gets the NetworkX graph of the polymer.
Returns:
Type | Description |
---|---|
Graph
|
nx.Graph: The graph representing the polymer structure. |
psmiles
property
writable
¶
Gets the pSMILES string of the polymer.
Returns:
Type | Description |
---|---|
Optional[str]
|
Optional[str]: The pSMILES string, or None if not set. |
sidechain_nodes
property
¶
Gets the list of sidechain node indices.
Returns:
Type | Description |
---|---|
List[int]
|
List[int]: List of node indices representing the sidechains. |
calculate_molecular_weight()
¶
Calculates the exact molecular weight of the polymer.
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The molecular weight of the polymer molecule. |
Source code in src/polymetrix/featurizers/polymer.py
from_psmiles(psmiles)
classmethod
¶
Creates a Polymer instance from a pSMILES string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
psmiles
|
str
|
str, the pSMILES string representing the polymer molecule. |
required |
Returns:
Name | Type | Description |
---|---|---|
Polymer |
Polymer
|
A new Polymer object initialized with the given pSMILES string. |
Raises:
Type | Description |
---|---|
ValueError
|
If the pSMILES string is invalid. |
Source code in src/polymetrix/featurizers/polymer.py
get_backbone_and_sidechain_graphs()
¶
Extracts NetworkX graphs for the backbone and sidechains.
Returns:
Type | Description |
---|---|
Tuple[Graph, List[Graph]]
|
Tuple[nx.Graph, List[nx.Graph]]: A tuple containing the backbone graph and a list of sidechain graphs. |
Source code in src/polymetrix/featurizers/polymer.py
get_backbone_and_sidechain_molecules()
¶
Extracts RDKit molecule objects for the backbone and sidechains.
Returns:
Type | Description |
---|---|
Tuple[List[Mol], List[Mol]]
|
Tuple[List[Chem.Mol], List[Chem.Mol]]: A tuple containing a list with the backbone molecule and a list of sidechain molecules. |
Source code in src/polymetrix/featurizers/polymer.py
get_connection_points()
¶
Gets the list of connection point node indices.
Returns:
Type | Description |
---|---|
List[int]
|
List[int]: List of node indices representing connection points. |
add_degree_one_nodes_to_backbone(graph, backbone)
¶
Adds degree-1 nodes connected to backbone nodes to the backbone list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
Graph
|
nx.Graph, the input graph to analyze. |
required |
backbone
|
List[int]
|
List[int], the initial list of backbone node indices. |
required |
Returns:
Type | Description |
---|---|
List[int]
|
List[int]: The updated backbone list including degree-1 nodes. |
Source code in src/polymetrix/featurizers/polymer.py
classify_backbone_and_sidechains(graph)
¶
Classifies nodes into backbone and sidechain components based on paths and cycles.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
Graph
|
nx.Graph, the input graph to classify. |
required |
Returns:
Type | Description |
---|---|
Tuple[List[int], List[int]]
|
Tuple[List[int], List[int]]: A tuple containing the list of backbone nodes and the list of sidechain nodes. |
Source code in src/polymetrix/featurizers/polymer.py
find_cycles_including_paths(graph, paths)
¶
Identifies cycles in the graph that include nodes from the given paths.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
Graph
|
nx.Graph, the input graph to analyze. |
required |
paths
|
List[List[int]]
|
List[List[int]], list of paths whose nodes are used to filter cycles. |
required |
Returns:
Type | Description |
---|---|
List[List[int]]
|
List[List[int]]: A list of unique cycles, where each cycle is a list of node indices. |
Source code in src/polymetrix/featurizers/polymer.py
find_shortest_paths_between_stars(graph)
¶
Finds shortest paths between all pairs of asterisk (*) nodes in the graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph
|
Graph
|
nx.Graph, the input graph to analyze. |
required |
Returns:
Type | Description |
---|---|
List[List[int]]
|
List[List[int]]: A list of shortest paths, where each path is a list of node indices. |
Source code in src/polymetrix/featurizers/polymer.py
sidechain_backbone_featurizer
¶
SidechainDiversityFeaturizer
¶
Bases: BaseFeatureCalculator
Computes the number of structurally diverse sidechains in a polymer based on graph isomorphism.
Source code in src/polymetrix/featurizers/sidechain_backbone_featurizer.py
SidechainLengthToStarAttachmentDistanceRatioFeaturizer
¶
Bases: BaseFeatureCalculator
Computes aggregated ratios of sidechain lengths to the shortest backbone distance from the polymer's star node (*) to each sidechain's attachment point.
Source code in src/polymetrix/featurizers/sidechain_backbone_featurizer.py
StarToSidechainMinDistanceFeaturizer
¶
Bases: BaseFeatureCalculator
Computes aggregated minimum backbone distances from star nodes (*) to sidechains in a polymer.
Source code in src/polymetrix/featurizers/sidechain_backbone_featurizer.py
splitters
¶
splitters
¶
PolymerClassSplitter
¶
Bases: BaseSplitter
Splitter based on polymer class
Source code in src/polymetrix/splitters/splitters.py
TgSplitter
¶
Bases: BaseSplitter
Splitter based on Tg values
Source code in src/polymetrix/splitters/splitters.py
__init__(ds, tg_q=None, label_name='labels.Exp_Tg(K)', shuffle=True, random_state=None, **kwargs)
¶
Initialize TgSplitter
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ds
|
AbstractDataset
|
Dataset to split |
required |
tg_q
|
Optional[Collection[float]]
|
Quantiles to bin Tg values into groups |
None
|
label_name
|
str
|
Name of the label to use for splitting |
'labels.Exp_Tg(K)'
|
shuffle
|
bool
|
Whether to shuffle the dataset |
True
|
random_state
|
Optional[Union[int, RandomState]]
|
Random state for shuffling |
None
|
**kwargs
|
Additional arguments to pass to BaseSplitter |
{}
|