Regression & ThermoML¶
Differentiable parameter estimation (Levenberg-Marquardt and gradient descent), UNIFAC-to-binary prediction, the bundled ThermoML parameter bank, and the NIST ThermoML archive reader.
Regression¶
regression
¶
Differentiable parameter estimation for activity-coefficient models.
Because every equilibrium output is differentiable with respect to the
activity-model parameters (see fugacio.thermo.gammaphi,
fugacio.thermo.lle), fitting a model to data is plain gradient-based
optimisation: no finite-difference parameter sweeps, no black-box derivatives.
This module supplies:
- two self-contained optimisers,
levenberg_marquardtfor nonlinear least squares (exact Gauss-Newton Hessian, adaptive damping) and a simplegradient_descent, that operate on an arbitrary parameter pytree; - residual builders that turn experimental data into a residual vector:
bubble_pressure_residuals(isothermal/isobaric P-x-y VLE),activity_residuals(measuredln gamma), andlle_residuals(mutual-solubility / tie-line data); and - convenience fitters (
fit_nrtl_binary,fit_uniquac_binary) that wire a model factory to the optimiser and return a ready model object.
A "model factory" is any theta -> ActivityModel mapping; the optimiser fits
theta (the differentiable leaves you choose to expose), so you control which
parameters are free and which are fixed.
Functions:
| Name | Description |
|---|---|
levenberg_marquardt |
Minimise |
gradient_descent |
Minimise a scalar |
bubble_pressure_residuals |
Residuals of predicted vs. measured bubble pressure (and optionally vapour). |
activity_residuals |
Residuals of predicted vs. measured log activity coefficients. |
lle_residuals |
Isoactivity residuals at measured liquid-liquid tie-line ends. |
fit_nrtl_binary |
Fit binary NRTL |
unifac_ln_gamma_grid |
Sample (modified) UNIFAC |
predict_nrtl_from_unifac |
Predict binary NRTL |
predict_uniquac_from_unifac |
Predict binary UNIQUAC |
fit_uniquac_binary |
Fit binary UNIQUAC |
levenberg_marquardt
¶
levenberg_marquardt(
residual: ResidualFn,
theta0: Any,
*,
max_iter: int = 100,
lambda0: float = 0.01,
factor: float = 5.0,
tol: float = 1e-12,
) -> tuple[Any, Array]
Minimise 0.5 * sum(residual(theta)**2) by Levenberg-Marquardt.
A trust-region blend of Gauss-Newton and gradient descent: each step solves
(J^T J + lambda diag(J^T J)) delta = -J^T r with the exact Jacobian J
(via jax.jacobian), shrinking lambda after an accepted step and
growing it after a rejected one. Operates on any parameter pytree theta0.
Returns:
| Type | Description |
|---|---|
Any
|
|
Array
|
half-sum-of-squares cost. |
gradient_descent
¶
gradient_descent(
objective: Callable[[Any], Array],
theta0: Any,
*,
learning_rate: float = 0.01,
max_iter: int = 500,
) -> tuple[Any, Array]
Minimise a scalar objective(theta) by fixed-step gradient descent.
A dependency-free fallback for objectives that are not least-squares; returns
(theta, objective(theta)).
bubble_pressure_residuals
¶
bubble_pressure_residuals(
make_model: ModelFactory,
t: Array,
x: Array,
p_exp: Array,
tc: Array,
pc: Array,
omega: Array,
*,
y_exp: Array | None = None,
p_scale: ArrayLike | None = None,
y_weight: float = 1.0,
**opts: Any,
) -> ResidualFn
Residuals of predicted vs. measured bubble pressure (and optionally vapour).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
make_model
|
ModelFactory
|
|
required |
t
|
Array
|
Temperatures (K), shape |
required |
x
|
Array
|
Liquid compositions, shape |
required |
p_exp
|
Array
|
Measured bubble pressures (Pa), shape |
required |
tc
|
Array
|
Component critical temperatures (K). |
required |
pc
|
Array
|
Component critical pressures (Pa). |
required |
omega
|
Array
|
Component acentric factors. |
required |
y_exp
|
Array | None
|
Optional measured vapour compositions, shape |
None
|
p_scale
|
ArrayLike | None
|
Pressure normaliser (defaults to |
None
|
y_weight
|
float
|
Relative weight on the vapour-composition residuals. |
1.0
|
**opts
|
Any
|
Forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
ResidualFn
|
|
activity_residuals
¶
activity_residuals(
make_model: ModelFactory,
t: Array,
x: Array,
ln_gamma_exp: Array,
) -> ResidualFn
Residuals of predicted vs. measured log activity coefficients.
t is shape (m,), x and ln_gamma_exp are shape (m, n).
lle_residuals
¶
Isoactivity residuals at measured liquid-liquid tie-line ends.
A consistent model makes each experimental conjugate pair iso-active:
x_i^I gamma_i^I = x_i^II gamma_i^II. t is (m,); the compositions
are (m, n).
fit_nrtl_binary
¶
fit_nrtl_binary(
t: Array,
x: Array,
p_exp: Array,
tc: Array,
pc: Array,
omega: Array,
*,
alpha: float = 0.3,
y_exp: Array | None = None,
b0: tuple[float, float] = (0.0, 0.0),
max_iter: int = 80,
**opts: Any,
) -> tuple[NRTL, Array]
Fit binary NRTL b parameters (fixed alpha) to bubble-point data.
The free parameters are the two 1/T interaction coefficients
b12, b21 (Kelvin); a = 0 and the non-randomness alpha are held
fixed. Returns (fitted NRTL, final cost).
unifac_ln_gamma_grid
¶
unifac_ln_gamma_grid(
components: list[str],
t: ArrayLike,
*,
points: int = 11,
dortmund: bool = False,
x_min: float = 0.02,
) -> tuple[Array, Array, Array]
Sample (modified) UNIFAC ln gamma on a binary composition/temperature grid.
This turns a predictive group-contribution model into pseudo-data for fitting a correlative NRTL/UNIQUAC model, the standard way to obtain binary interaction parameters for a pair that has no measured VLE.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
components
|
list[str]
|
Exactly two component names with UNIFAC group assignments. |
required |
t
|
ArrayLike
|
Temperature(s) (K); a scalar or 1-D array. The grid is the outer product of the temperatures with the composition samples. |
required |
points
|
int
|
Number of liquid compositions sampled in |
11
|
dortmund
|
bool
|
Use modified UNIFAC (Dortmund) instead of classic UNIFAC. |
False
|
x_min
|
float
|
Smallest mole fraction sampled (kept away from the pure limits). |
0.02
|
Returns:
| Type | Description |
|---|---|
Array
|
|
Array
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
if |
predict_nrtl_from_unifac
¶
predict_nrtl_from_unifac(
components: list[str],
t: ArrayLike,
*,
alpha: float = 0.3,
dortmund: bool = False,
points: int = 11,
x_min: float = 0.02,
b0: tuple[float, float] = (0.0, 0.0),
max_iter: int = 120,
) -> tuple[NRTL, Array]
Predict binary NRTL b parameters by fitting to UNIFAC activity coefficients.
UNIFAC supplies ln gamma over a composition (and temperature) grid; the two
NRTL 1/T coefficients b12, b21 (with a = 0 and fixed alpha)
are fitted to it by levenberg_marquardt. Use it to bootstrap a
correlative model for a pair without measured data.
Returns:
| Type | Description |
|---|---|
tuple[NRTL, Array]
|
|
predict_uniquac_from_unifac
¶
predict_uniquac_from_unifac(
components: list[str],
t: ArrayLike,
*,
r: Array | None = None,
q: Array | None = None,
dortmund: bool = False,
points: int = 11,
x_min: float = 0.02,
b0: tuple[float, float] = (0.0, 0.0),
max_iter: int = 120,
) -> tuple[UNIQUAC, Array]
Predict binary UNIQUAC b parameters by fitting to UNIFAC activity coefficients.
Like predict_nrtl_from_unifac, but for UNIQUAC. The surface/volume
parameters r, q default to the curated values
(fugacio.thermo.data.uniquac_rq); the free parameters are the 1/T
coefficients of tau = exp(b/T).
Returns:
| Type | Description |
|---|---|
tuple[UNIQUAC, Array]
|
|
fit_uniquac_binary
¶
fit_uniquac_binary(
t: Array,
x: Array,
p_exp: Array,
tc: Array,
pc: Array,
omega: Array,
r: Array,
q: Array,
*,
y_exp: Array | None = None,
b0: tuple[float, float] = (0.0, 0.0),
max_iter: int = 80,
**opts: Any,
) -> tuple[UNIQUAC, Array]
Fit binary UNIQUAC b parameters (with given r, q) to bubble-point data.
Free parameters are the 1/T coefficients of ln tau = a + b/T with
a = 0. Returns (fitted UNIQUAC, final cost).
Parameter bank¶
parameter_bank
¶
Batch regression of ThermoML datasets into a reusable binary-parameter bank.
This is the bridge between the ThermoML reader (fugacio.thermo.thermoml)
and the differentiable fitters (fugacio.thermo.regression): point it at
parsed documents (or the bundled samples) and it fits a binary activity model to
every isothermal P-x VLE table it can understand, recording the fitted
parameters together with their provenance: source dataset, temperature range,
point count, and the scaled root-mean-square pressure residual. The collected
FittedBinary records form a ParameterBank that hands back
ready-to-use NRTL models with the
component ordering you ask for.
A bank fitted to the bundled samples ships with the package
(parameter_bank.json, regenerated by scripts/gen_parameter_bank.py); load
it with ParameterBank.load_bundled. Components are matched to the
Fugacio database by CAS number, never by name, so document-local naming
quirks ("water (H2O)") cannot mis-assign a dataset.
Classes:
| Name | Description |
|---|---|
FittedBinary |
A fitted binary interaction record with its provenance. |
ParameterBank |
A lookup table of fitted binary parameters, keyed by component pair. |
Functions:
| Name | Description |
|---|---|
fit_vle_dataset |
Fit binary NRTL |
fit_bundled_samples |
Fit every binary P-x VLE dataset among the bundled ThermoML samples. |
FittedBinary
dataclass
¶
FittedBinary(
components: tuple[str, str],
model: str,
alpha: float,
b12: float,
b21: float,
t_min: float,
t_max: float,
n_points: int,
rmse: float,
source: str,
)
A fitted binary interaction record with its provenance.
Attributes:
| Name | Type | Description |
|---|---|---|
components |
tuple[str, str]
|
Database names |
model |
str
|
Activity-model family (currently always |
alpha |
float
|
The fixed NRTL non-randomness parameter used in the fit. |
b12, |
b21
|
Fitted |
t_min, |
t_max
|
Temperature range of the underlying data (K). |
n_points |
int
|
Number of data rows fitted. |
rmse |
float
|
Root-mean-square residual of |
source |
str
|
Provenance string (sample name or document citation). |
Methods:
| Name | Description |
|---|---|
nrtl |
The fitted model as a ready-to-evaluate two-component NRTL. |
ParameterBank
¶
ParameterBank(entries: list[FittedBinary])
A lookup table of fitted binary parameters, keyed by component pair.
Pairs are stored orientation-free: get and nrtl accept the
components in either order and return parameters oriented as requested
(swapping b12/b21 when needed).
Methods:
| Name | Description |
|---|---|
get |
The record for a pair, reoriented so |
nrtl |
A ready NRTL model for the pair, in the requested component order. |
to_json |
Serialize the bank (sorted, human-diffable). |
from_json |
Inverse of |
load_bundled |
The bank fitted to the bundled ThermoML samples (ships with the package). |
Attributes:
| Name | Type | Description |
|---|---|---|
entries |
list[FittedBinary]
|
All records, sorted by component pair for stable iteration. |
entries
property
¶
entries: list[FittedBinary]
All records, sorted by component pair for stable iteration.
get
¶
get(
component_1: str, component_2: str
) -> FittedBinary | None
The record for a pair, reoriented so component_1 is component 1.
nrtl
¶
A ready NRTL model for the pair, in the requested component order.
Raises:
| Type | Description |
|---|---|
KeyError
|
if the bank has no record for the pair. |
load_bundled
classmethod
¶
load_bundled() -> ParameterBank
The bank fitted to the bundled ThermoML samples (ships with the package).
fit_vle_dataset
¶
fit_vle_dataset(
data: ThermoMLData,
dataset: Dataset,
*,
alpha: float = 0.3,
source: str = "",
max_iter: int = 80,
) -> FittedBinary
Fit binary NRTL b parameters to one parsed P-x VLE dataset.
The dataset must be binary with a temperature column, a composition column for its first component, and a pressure column (the layout of every bundled VLE sample and of archive isothermal P-x tables).
Returns:
| Type | Description |
|---|---|
FittedBinary
|
The |
FittedBinary
|
dataset's first component. |
Raises:
| Type | Description |
|---|---|
ValueError
|
if the dataset is not a binary P-x table. |
KeyError
|
if a compound cannot be matched to the component database. |
fit_bundled_samples
¶
fit_bundled_samples(
names: list[str] | None = None, *, alpha: float = 0.3
) -> list[FittedBinary]
Fit every binary P-x VLE dataset among the bundled ThermoML samples.
Non-VLE samples (pure-component tables, missing columns) are skipped, so the driver can be pointed at the whole sample directory. This is the batch regression behind the bundled parameter bank.
ThermoML reader¶
thermoml
¶
Reader for the NIST ThermoML archive XML format.
ThermoML <https://www.nist.gov/mml/acmd/trc/thermoml>_ is the IUPAC/NIST XML
standard for thermophysical and thermochemical property data; the freely
redistributable ThermoML Archive
<https://www.nist.gov/mml/acmd/trc/thermoml/thermoml-archive>_ holds tens of
thousands of experimental datasets. This module turns those files into tidy,
typed tables you can feed straight into fugacio.thermo.regression, so a
model can be fitted to real measurements, and predictions graded against them.
The parser is deliberately tolerant and dependency-free (standard-library
xml.etree.ElementTree only):
- XML namespaces are stripped, so files declaring the ThermoML namespace (or none) parse identically;
- compounds, mixtures, variables, properties, and the numeric value rows are read by their local element names, matching the published schema without binding to a specific version;
- each
Datasetexposes its columns as aligned numeric rows plus convenience accessors (Dataset.temperature,Dataset.pressure,Dataset.mole_fraction) with pressure unit conversion to pascal.
A couple of small, schema-faithful datasets ship with the package for tests and
examples; see list_samples / load_sample.
Classes:
| Name | Description |
|---|---|
Compound |
A chemical compound declared in a ThermoML document. |
Column |
One variable or property column of a |
Dataset |
A |
ThermoMLData |
A parsed ThermoML document: its compounds, datasets, and citation. |
Functions:
| Name | Description |
|---|---|
loads |
Parse a ThermoML document from an in-memory string or bytes. |
read_thermoml |
Parse a ThermoML document from a path or open file object. |
list_samples |
Names (without extension) of the bundled ThermoML sample datasets. |
sample_path |
Filesystem path of a bundled sample (with or without the |
load_sample |
Parse a bundled ThermoML sample by name (see |
Compound
dataclass
¶
Compound(
org_num: int,
name: str | None = None,
formula: str | None = None,
cas: str | None = None,
inchikey: str | None = None,
)
A chemical compound declared in a ThermoML document.
Attributes:
| Name | Type | Description |
|---|---|---|
org_num |
int
|
The document-local organization number used to reference this compound from mixtures and composition variables. |
name |
str | None
|
Common name, if given. |
formula |
str | None
|
Molecular formula, if given. |
cas |
str | None
|
CAS registry number, if present. |
inchikey |
str | None
|
Standard InChIKey, if present. |
Column
dataclass
¶
One variable or property column of a Dataset table.
Attributes:
| Name | Type | Description |
|---|---|---|
number |
int
|
The |
role |
str
|
|
kind |
str
|
The ThermoML type element local name, e.g. |
label |
str
|
Human-readable label including units, e.g. |
component |
int | None
|
For composition columns, the |
Dataset
dataclass
¶
Dataset(
components: tuple[int, ...],
columns: tuple[Column, ...],
rows: tuple[tuple[float, ...], ...],
phase: str | None = None,
number: int | None = None,
)
A PureOrMixtureData block: a table of measurements for one mixture.
The rows are aligned with columns; a missing cell is float('nan').
Attributes:
| Name | Type | Description |
|---|---|---|
components |
tuple[int, ...]
|
|
columns |
tuple[Column, ...]
|
The variable and property columns, in document order. |
rows |
tuple[tuple[float, ...], ...]
|
Numeric rows aligned with |
phase |
str | None
|
The reported phase string, if any (e.g. |
number |
int | None
|
The |
Methods:
| Name | Description |
|---|---|
values |
All values of one column, in row order. |
find_column |
First column matching the given filters (any combination). |
temperature |
Temperatures in kelvin (raises if the dataset has no temperature column). |
pressure |
Pressures converted to |
mole_fraction |
Mole fractions of |
to_dict |
The table as |
find_column
¶
find_column(
*,
kind: str | None = None,
quantity: str | None = None,
component: int | None = None,
) -> Column | None
First column matching the given filters (any combination).
temperature
¶
Temperatures in kelvin (raises if the dataset has no temperature column).
pressure
¶
Pressures converted to unit (default pascal).
Accepts pressure stored either as a controlled variable (ePressure) or
as a measured property (e.g. a vapour-pressure column).
mole_fraction
¶
Mole fractions of component (by org_num).
ThermoMLData
dataclass
¶
ThermoMLData(
compounds: tuple[Compound, ...],
datasets: tuple[Dataset, ...],
citation: str | None = None,
)
A parsed ThermoML document: its compounds, datasets, and citation.
Methods:
| Name | Description |
|---|---|
compound |
The compound with the given |
component_names |
Best-effort names of a dataset's components (falls back to |
loads
¶
loads(text: str | bytes) -> ThermoMLData
Parse a ThermoML document from an in-memory string or bytes.
read_thermoml
¶
list_samples
¶
Names (without extension) of the bundled ThermoML sample datasets.
sample_path
¶
Filesystem path of a bundled sample (with or without the .xml suffix).
load_sample
¶
load_sample(name: str) -> ThermoMLData
Parse a bundled ThermoML sample by name (see list_samples).