Data-efficient prediction in tableting using word embeddings and empirically-guided neural networks

Dec. 3, 2025, midnight - by Najeeb Abdelrahman , Stefan Klinken-Uth

Remove from favorites

Add to favorites

Developing reliable tablet formulations is often slowed by limited experimental data and the challenge of integrating categorical variables—such as API identity—into predictive models. Conventional regression offers clarity but struggles with complex nonlinear behaviors, while machine learning can improve accuracy at the cost of interpretability. This work introduces a neural network approach that overcomes these limitations by using word-embedding layers to encode categorical formulation components (e.g., APIs) as meaningful numerical vectors. Combined with empirically informed output functions and a deep-ensemble setup, the model predicts key tablet performance attributes—tensile strength, density, ejection force, and dosing height—using only formulation ratios, compression pressure, and tablet weight. The method delivers accuracy on par with or better than classical regression while avoiding non-physical predictions. The learned API embeddings form chemically and mechanically relevant clusters, enabling transfer learning and reliable predictions even for poorly characterized or low-data APIs. Additionally, information-gain analysis shows that low-dose formulations can significantly improve model performance, supporting leaner and more efficient experimental programs. Overall, this embedding-driven neural network framework provides a practical, interpretable, and data-efficient tool to accelerate tablet formulation development.

Read full article