API Reference¶
TSNE_embeddings
¶
parse_embedding(embedding_str)
¶
Parse embedding string to numpy array
Source code in src/polymetrix/TSNE_embeddings.py
process_embeddings(data, label)
¶
Process embeddings and concatenate them
Source code in src/polymetrix/TSNE_embeddings.py
datasets
¶
AbstractDataset
¶
Bases: ABC
Base class for polymer datasets.
Source code in src/polymetrix/datasets/dataset.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
available_features
property
¶
List of available features. Returns: list[str]: List of feature names
available_labels
property
¶
List of available labels. Returns: list[str]: List of label names
meta_info
property
¶
List of available metadata fields. Returns: list[str]: List of metadata field names
psmiles
property
¶
Return the polymer SMILES strings. Returns: np.ndarray: Array of polymer SMILES strings
__init__()
¶
Initialize a dataset.
__iter__()
¶
__len__()
¶
_load_data(subset=None)
abstractmethod
¶
Load and prepare the dataset-specific data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subset
|
Optional[Collection[int]]
|
Indices to include in the dataset. |
None
|
get_features(idx, feature_names=None)
¶
Get features for specified indices. Args: idx (Collection[int]): Indices of entries. feature_names (Optional[Collection[str]]): Names of features to return. If None, returns all available features. Returns: np.ndarray: Array of feature values.
Source code in src/polymetrix/datasets/dataset.py
get_labels(idx, label_names=None)
¶
Get labels for specified indices. Args: idx (Collection[int]): Indices of entries. label_names (Optional[Collection[str]]): Names of labels to return. If None, returns all available labels. Returns: np.ndarray: Array of label values.
Source code in src/polymetrix/datasets/dataset.py
get_meta(idx, meta_keys=None)
¶
Get metadata for specified indices. Args: idx (Collection[int]): Indices of entries. meta_keys (Optional[Collection[str]]): Names of metadata fields to return. If None, returns all available metadata.
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Array of metadata values. |
Source code in src/polymetrix/datasets/dataset.py
get_subset(indices)
¶
Get a subset of the dataset.
Source code in src/polymetrix/datasets/dataset.py
CuratedGlassTempDataset
¶
Bases: AbstractDataset
Dataset for polymer glass transition temperature (Tg) data.
Source code in src/polymetrix/datasets/curated_tg_dataset.py
__init__(feature_levels=ALL_FEATURE_LEVELS, subset=None)
¶
Initialize the Tg dataset. Args: feature_levels (List[str]): Feature levels to include subset (Optional[Collection[int]]): Indices to include in the dataset
Source code in src/polymetrix/datasets/curated_tg_dataset.py
_filter_columns(prefixes)
¶
Helper to filter columns by prefix(es).
_load_data(subset=None)
¶
Load and prepare the dataset.
Source code in src/polymetrix/datasets/curated_tg_dataset.py
curated_tg_dataset
¶
CuratedGlassTempDataset
¶
Bases: AbstractDataset
Dataset for polymer glass transition temperature (Tg) data.
Source code in src/polymetrix/datasets/curated_tg_dataset.py
__init__(feature_levels=ALL_FEATURE_LEVELS, subset=None)
¶
Initialize the Tg dataset. Args: feature_levels (List[str]): Feature levels to include subset (Optional[Collection[int]]): Indices to include in the dataset
Source code in src/polymetrix/datasets/curated_tg_dataset.py
_filter_columns(prefixes)
¶
Helper to filter columns by prefix(es).
_load_data(subset=None)
¶
Load and prepare the dataset.
Source code in src/polymetrix/datasets/curated_tg_dataset.py
dataset
¶
AbstractDataset
¶
Bases: ABC
Base class for polymer datasets.
Source code in src/polymetrix/datasets/dataset.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
available_features
property
¶
List of available features. Returns: list[str]: List of feature names
available_labels
property
¶
List of available labels. Returns: list[str]: List of label names
meta_info
property
¶
List of available metadata fields. Returns: list[str]: List of metadata field names
psmiles
property
¶
Return the polymer SMILES strings. Returns: np.ndarray: Array of polymer SMILES strings
__init__()
¶
Initialize a dataset.
__iter__()
¶
__len__()
¶
_load_data(subset=None)
abstractmethod
¶
Load and prepare the dataset-specific data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subset
|
Optional[Collection[int]]
|
Indices to include in the dataset. |
None
|
get_features(idx, feature_names=None)
¶
Get features for specified indices. Args: idx (Collection[int]): Indices of entries. feature_names (Optional[Collection[str]]): Names of features to return. If None, returns all available features. Returns: np.ndarray: Array of feature values.
Source code in src/polymetrix/datasets/dataset.py
get_labels(idx, label_names=None)
¶
Get labels for specified indices. Args: idx (Collection[int]): Indices of entries. label_names (Optional[Collection[str]]): Names of labels to return. If None, returns all available labels. Returns: np.ndarray: Array of label values.
Source code in src/polymetrix/datasets/dataset.py
get_meta(idx, meta_keys=None)
¶
Get metadata for specified indices. Args: idx (Collection[int]): Indices of entries. meta_keys (Optional[Collection[str]]): Names of metadata fields to return. If None, returns all available metadata.
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Array of metadata values. |
Source code in src/polymetrix/datasets/dataset.py
get_subset(indices)
¶
Get a subset of the dataset.
Source code in src/polymetrix/datasets/dataset.py
embedding
¶
evaluate_model(y_true, y_pred)
¶
Calculate evaluation metrics
experiment_1_molformer_psmiles(train_df, test_df)
¶
Experiment 1: MoLFormer pSMILES embeddings
Source code in src/polymetrix/embedding.py
experiment_2_molformer_bigsmiles(train_df, test_df)
¶
Experiment 2: MoLFormer BigSMILES embeddings
Source code in src/polymetrix/embedding.py
experiment_3_molformer_combined(train_df, test_df)
¶
Experiment 3: Combined MoLFormer pSMILES + BigSMILES embeddings (doubled dataset)
Source code in src/polymetrix/embedding.py
load_datasets()
¶
Load training and test datasets
Source code in src/polymetrix/embedding.py
main()
¶
Main execution function with MoLFormer experiments
Source code in src/polymetrix/embedding.py
1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 | |
parse_molformer_embedding_column(df, column_name)
¶
Parse MoLFormer embedding column from list representation to numpy array
Source code in src/polymetrix/embedding.py
1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 | |
run_experiment(X_train, y_train, X_test, y_test, experiment_name)
¶
Run experiment with multiple seeds and return results
Source code in src/polymetrix/embedding.py
embedding_copy
¶
evaluate_model(y_true, y_pred)
¶
Calculate evaluation metrics
load_datasets()
¶
Load training and test datasets
Source code in src/polymetrix/embedding_copy.py
main()
¶
Main execution function
Source code in src/polymetrix/embedding_copy.py
parse_embedding_column(df, column_name)
¶
Parse embedding column from string representation to numpy array
Source code in src/polymetrix/embedding_copy.py
task1_baseline_ecfp(train_df, test_df)
¶
Task 1: Baseline model using ECFP fingerprints
Source code in src/polymetrix/embedding_copy.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 | |
task2_combined_embeddings(train_df, test_df)
¶
Task 2: Combined embeddings model
Source code in src/polymetrix/embedding_copy.py
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 | |
featurizers
¶
base_featurizer
¶
BaseFeatureCalculator
¶
Source code in src/polymetrix/featurizers/base_featurizer.py
_sanitize(mol, sanitize)
¶
Handle molecule sanitization with kekulization exception handling.
Source code in src/polymetrix/featurizers/base_featurizer.py
aggregate(features)
¶
Aggregates a list of features using the aggregation functions specified in self.agg. If the features are numpy arrays, the aggregation is applied along the first axis. Otherwise, the aggregation is applied directly (assuming the features are scalar numeric values).
Source code in src/polymetrix/featurizers/base_featurizer.py
MoleculeFeaturizer
¶
Base class for featurizers that work with general molecules.
Source code in src/polymetrix/featurizers/base_featurizer.py
chemical_featurizer
¶
BalabanJIndex
¶
Bases: GenericScalarFeaturizer
Measures molecular complexity and connectivity of atoms.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
BondCounts
¶
Bases: BaseFeatureCalculator
Counts the number of single, double, and triple bonds in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
BridgingRingsCount
¶
Bases: BaseFeatureCalculator
Counts the number of bridging rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
FpDensityMorgan1
¶
Bases: GenericScalarFeaturizer
Calculates the density of the Morgan1 fingerprint.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
FractionBicyclicRings
¶
Bases: BaseFeatureCalculator
Calculates the fraction of bicyclic rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
HalogenCounts
¶
Bases: BaseFeatureCalculator
Counts the number of halogen atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
HeteroatomCount
¶
Bases: BaseFeatureCalculator
Counts heteroatoms (non-C, non-H) in heterocyclic rings.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
HeteroatomDensity
¶
Bases: BaseFeatureCalculator
Density of heteroatoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
MaxEStateIndex
¶
Bases: GenericScalarFeaturizer
Maximum electronic state index, reflecting charge distribution.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
MaxRingSize
¶
Bases: BaseFeatureCalculator
Calculates the size of the largest ring in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
MolecularWeight
¶
Bases: GenericScalarFeaturizer
Calculates the molecular weight of the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumAliphaticHeterocycles
¶
Bases: BaseFeatureCalculator
Counts the number of aliphatic heterocycles in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumAromaticRings
¶
Bases: BaseFeatureCalculator
Counts the number of aromatic rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumAtoms
¶
Bases: BaseFeatureCalculator
Counts the number of atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumHBondAcceptors
¶
Bases: GenericScalarFeaturizer
Counts Number of hydrogen bond acceptors.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumHBondDonors
¶
Bases: GenericScalarFeaturizer
Counts Number of hydrogen bond donors.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumNonAromaticRings
¶
Bases: BaseFeatureCalculator
Counts the number of non-aromatic rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumRings
¶
Bases: BaseFeatureCalculator
Counts the number of rings in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
NumRotatableBonds
¶
Bases: GenericScalarFeaturizer
Counts Number of rotatable bonds.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
SlogPVSA1
¶
Bases: GenericScalarFeaturizer
Calculates the Surface area contributing to octanol solubility, linked to lipophilicity.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
SmrVSA5
¶
Bases: GenericScalarFeaturizer
Molar refractivity sum for atoms with specific surface area (2.45–2.75).
Source code in src/polymetrix/featurizers/chemical_featurizer.py
Sp2CarbonCountFeaturizer
¶
Bases: BaseFeatureCalculator
Counts the number of sp2 hybridized carbon atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
Sp3CarbonCountFeaturizer
¶
Bases: BaseFeatureCalculator
Counts the number of sp3 hybridized carbon atoms in the molecule.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
TopologicalSurfaceArea
¶
Bases: GenericScalarFeaturizer
Calculates the topological polar surface area.
Source code in src/polymetrix/featurizers/chemical_featurizer.py
comparator
¶
PolymerMoleculeComparator
¶
Comparator that computes various comparison metrics between polymer and molecule features.
Source code in src/polymetrix/featurizers/comparator.py
aggregate(features)
¶
Aggregate features across comparison methods.
Source code in src/polymetrix/featurizers/comparator.py
compare(polymer, molecule)
¶
Return comparison metrics between polymer and molecule features.
Source code in src/polymetrix/featurizers/comparator.py
feature_labels()
¶
Generate labels for comparison and aggregated features.
Source code in src/polymetrix/featurizers/comparator.py
molecule
¶
FullMolecularFeaturizer
¶
Bases: MoleculeFeaturizer
Featurizer for general molecules.
This class can featurize any molecule from a Molecule object that contains a SMILES string and RDKit molecule object.
Source code in src/polymetrix/featurizers/molecule.py
featurize(molecule)
¶
Featurize a molecule object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
molecule
|
A Molecule object with a mol property containing an RDKit molecule. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Feature vector calculated by the underlying calculator. |
Source code in src/polymetrix/featurizers/molecule.py
Molecule
¶
A class to represent a general molecule from SMILES string.
Attributes:
| Name | Type | Description |
|---|---|---|
smiles |
Optional[str]
|
Optional[str], the SMILES string representing the molecule. |
mol |
Optional[Mol]
|
Optional[Chem.Mol], the RDKit molecule object. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the provided SMILES string is invalid or cannot be processed. |
Source code in src/polymetrix/featurizers/molecule.py
mol
property
¶
Gets the RDKit molecule object.
Returns:
| Type | Description |
|---|---|
Optional[Mol]
|
Optional[Chem.Mol]: The RDKit molecule object, or None if not set. |
smiles
property
writable
¶
Gets the SMILES string of the molecule.
Returns:
| Type | Description |
|---|---|
Optional[str]
|
Optional[str]: The SMILES string, or None if not set. |
calculate_molecular_weight()
¶
Calculates the exact molecular weight of the molecule.
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The molecular weight of the molecule. |
Source code in src/polymetrix/featurizers/molecule.py
from_smiles(smiles)
classmethod
¶
Creates a Molecule instance from a SMILES string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
smiles
|
str
|
str, the SMILES string representing the molecule. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Molecule |
Molecule
|
A new Molecule object initialized with the given SMILES string. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the SMILES string is invalid. |
Source code in src/polymetrix/featurizers/molecule.py
multiple_featurizer
¶
MultipleFeaturizer
¶
Source code in src/polymetrix/featurizers/multiple_featurizer.py
feature_labels()
¶
Return feature labels with '_with_terminalgroups' suffix when applicable.
Source code in src/polymetrix/featurizers/multiple_featurizer.py
polymer
¶
Polymer
¶
Represents a polymer molecule with its backbone and sidechain information.
Attributes:
| Name | Type | Description |
|---|---|---|
psmiles |
Optional[str]
|
Optional[str], the pSMILES string of the polymer. |
backbone_terminal_groups |
Optional[Dict[str, str]]
|
Optional[Dict[str, str]], maps connection point patterns to backbone terminal group SMILES. |
sidechain_terminal_groups |
Optional[Dict[str, str]]
|
Optional[Dict[str, str]], maps connection point patterns to sidechain terminal group SMILES. |
graph |
Optional[nx.Graph], the NetworkX graph of the polymer structure. |
|
backbone_nodes |
Optional[List[int]], node indices forming the backbone. |
|
sidechain_nodes |
Optional[List[int]], node indices forming the sidechains. |
|
connection_points |
Optional[List[int]], node indices of connection points. |
|
_mol |
Optional[Chem.Mol], the RDKit molecule object (internal use). |
Source code in src/polymetrix/featurizers/polymer.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 | |
backbone_molecule
property
¶
Gets the backbone molecule.
backbone_terminal_groups
property
writable
¶
Maps connection point patterns to backbone terminal group SMILES.
full_polymer_mol
property
¶
Gets the full polymer molecule.
mol
property
¶
Returns the full polymer molecule, compatible with featurizers expecting a 'mol' attribute.
psmiles
property
writable
¶
The pSMILES string of the polymer.
sidechain_molecules
property
¶
Gets the sidechain molecules.
sidechain_terminal_groups
property
writable
¶
Maps connection point patterns to sidechain terminal group SMILES.
_extract_substructure_mol(node_indices)
¶
Extracts a substructure molecule from the main molecule using node indices.
Source code in src/polymetrix/featurizers/polymer.py
_get_backbone_molecule(include_terminal_groups=True)
¶
Internal method to get backbone molecule with optional terminal groups.
Source code in src/polymetrix/featurizers/polymer.py
_get_full_polymer_mol(include_terminal_groups=True)
¶
Internal method to get full polymer molecule with optional terminal groups.
Source code in src/polymetrix/featurizers/polymer.py
_get_sidechain_molecules(include_terminal_groups=True)
¶
Internal method to get sidechain molecules with optional terminal groups.
Source code in src/polymetrix/featurizers/polymer.py
_identify_backbone_and_sidechain()
¶
Classifies nodes into backbone and sidechain components.
_identify_connection_points()
¶
Identifies connection points (asterisk atoms) in the polymer graph.
_mol_to_nx(mol)
staticmethod
¶
Converts an RDKit molecule to a NetworkX graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mol
|
Mol
|
The RDKit molecule to convert. |
required |
Returns:
| Type | Description |
|---|---|
Graph
|
A NetworkX graph representing the molecule's structure. |
Source code in src/polymetrix/featurizers/polymer.py
calculate_molecular_weight()
¶
Calculates the exact molecular weight of the polymer.
Returns:
| Type | Description |
|---|---|
float
|
The molecular weight of the polymer. |
from_psmiles(psmiles)
classmethod
¶
Creates a Polymer instance from a pSMILES string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
psmiles
|
str
|
The pSMILES string representing the polymer. |
required |
Returns:
| Type | Description |
|---|---|
Polymer
|
A new Polymer instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the pSMILES string is invalid. |
Source code in src/polymetrix/featurizers/polymer.py
get_backbone_and_sidechain_graphs()
¶
Extracts NetworkX graphs for the backbone and sidechains.
Returns:
| Type | Description |
|---|---|
Tuple[Graph, List[Graph]]
|
A tuple of (backbone graph, list of sidechain graphs). |
Source code in src/polymetrix/featurizers/polymer.py
get_backbone_and_sidechain_molecules()
¶
Extracts RDKit molecules for the backbone and sidechains.
Returns:
| Type | Description |
|---|---|
Tuple[List[Mol], List[Mol]]
|
A tuple of (list of backbone molecules, list of sidechain molecules). |
Source code in src/polymetrix/featurizers/polymer.py
get_connection_points()
¶
Gets the connection point node indices.
Returns:
| Type | Description |
|---|---|
List[int]
|
List of node indices representing connection points. |
add_degree_one_nodes_to_backbone(graph, backbone)
¶
Adds degree-1 nodes connected to backbone nodes to the backbone list, avoiding duplicates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
Graph
|
The input graph to analyze. |
required |
backbone
|
List[int]
|
Initial list of backbone node indices. |
required |
Returns:
| Type | Description |
|---|---|
List[int]
|
Updated backbone list including degree-1 nodes, with no duplicates. |
Source code in src/polymetrix/featurizers/polymer.py
attach_terminal_to_atom(mol, target_idx, terminal_mol, attachment_idx=None)
¶
Attaches a terminal group to a specific atom in the molecule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mol
|
RWMol
|
The molecule being modified. |
required |
target_idx
|
int
|
Index of the target atom to attach the terminal group. |
required |
terminal_mol
|
Mol
|
The terminal group molecule. |
required |
attachment_idx
|
int
|
Index of the attachment point in the terminal group (optional for sidechains). |
None
|
Returns:
| Type | Description |
|---|---|
RWMol
|
The modified molecule. |
Source code in src/polymetrix/featurizers/polymer.py
classify_backbone_and_sidechains(graph)
¶
Classifies nodes into backbone and sidechain components based on paths and cycles.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
Graph
|
The input graph to classify. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[List[int], List[int]]
|
A tuple of (backbone nodes, sidechain nodes). |
Source code in src/polymetrix/featurizers/polymer.py
find_cycles_including_paths(graph, paths)
¶
Identifies cycles that include nodes from the given paths.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
Graph
|
The input graph to analyze. |
required |
paths
|
List[List[int]]
|
List of paths whose nodes are used to filter cycles. |
required |
Returns:
| Type | Description |
|---|---|
List[List[int]]
|
List of cycles, where each cycle is a list of node indices. |
Source code in src/polymetrix/featurizers/polymer.py
find_shortest_paths_between_stars(graph)
¶
Finds shortest paths between all pairs of asterisk (*) nodes in the graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
Graph
|
The input graph to analyze. |
required |
Returns:
| Type | Description |
|---|---|
List[List[int]]
|
List of shortest paths, where each path is a list of node indices. |
Source code in src/polymetrix/featurizers/polymer.py
insert_terminal_group(mol, terminal_groups, is_sidechain=False)
¶
Inserts terminal groups into a molecule by replacing connection points or attaching to sidechains.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mol
|
Mol
|
The RDKit molecule to modify. |
required |
terminal_groups
|
Dict[str, str]
|
Dictionary mapping patterns to terminal group SMILES. |
required |
is_sidechain
|
bool
|
If True, attach terminal groups to sidechains; else, replace backbone connection points. |
False
|
Returns:
| Type | Description |
|---|---|
Mol
|
A new RDKit molecule with terminal groups inserted. |
Source code in src/polymetrix/featurizers/polymer.py
replace_asterisk_with_terminal(mol, asterisk_idx, terminal_mol, attachment_idx)
¶
Replaces a single asterisk atom with a terminal group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mol
|
RWMol
|
The molecule being modified. |
required |
asterisk_idx
|
int
|
Index of the asterisk atom to replace. |
required |
terminal_mol
|
Mol
|
The terminal group molecule. |
required |
attachment_idx
|
int
|
Index of the attachment point in the terminal group. |
required |
Returns:
| Type | Description |
|---|---|
RWMol
|
The modified molecule. |
Source code in src/polymetrix/featurizers/polymer.py
sidechain_backbone_featurizer
¶
SidechainDiversityFeaturizer
¶
Bases: BaseFeatureCalculator
Computes the number of structurally diverse sidechains in a polymer based on graph isomorphism.
Source code in src/polymetrix/featurizers/sidechain_backbone_featurizer.py
SidechainLengthToStarAttachmentDistanceRatioFeaturizer
¶
Bases: BaseFeatureCalculator
Computes aggregated ratios of sidechain lengths to the shortest backbone distance from the polymer's star node (*) to each sidechain's attachment point.
Source code in src/polymetrix/featurizers/sidechain_backbone_featurizer.py
_compute_min_backbone_length(sidechain, star_nodes, star_paths, graph)
¶
Calculate the minimum backbone distance from any star node to the sidechain's attachment point.
Source code in src/polymetrix/featurizers/sidechain_backbone_featurizer.py
StarToSidechainMinDistanceFeaturizer
¶
Bases: BaseFeatureCalculator
Computes aggregated minimum backbone distances from star nodes (*) to sidechains in a polymer.
Source code in src/polymetrix/featurizers/sidechain_backbone_featurizer.py
kmeans
¶
parse_fingerprint(fp_string)
¶
Parse fingerprint string to numpy array
Source code in src/polymetrix/kmeans.py
splitters
¶
splitters
¶
PolymerClassSplitter
¶
Bases: BaseSplitter
Splitter based on polymer class
Source code in src/polymetrix/splitters/splitters.py
TgSplitter
¶
Bases: BaseSplitter
Splitter based on Tg values
Source code in src/polymetrix/splitters/splitters.py
__init__(ds, tg_q=None, label_name='labels.Exp_Tg(K)', shuffle=True, random_state=None, **kwargs)
¶
Initialize TgSplitter
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
AbstractDataset
|
Dataset to split |
required |
tg_q
|
Optional[Collection[float]]
|
Quantiles to bin Tg values into groups |
None
|
label_name
|
str
|
Name of the label to use for splitting |
'labels.Exp_Tg(K)'
|
shuffle
|
bool
|
Whether to shuffle the dataset |
True
|
random_state
|
Optional[Union[int, RandomState]]
|
Random state for shuffling |
None
|
**kwargs
|
Additional arguments to pass to BaseSplitter |
{}
|
Source code in src/polymetrix/splitters/splitters.py
_get_groups()
¶
Bin Tg values into quantile-based groups