Research with Grit
Publications
Open, rigorous, and meaningful work.
Revisiting Training Scale: An Empirical Study of Token Count, Power Consumption, and Parameter Efficiency
2026. Dissertation follow-up study.
Read on arXivAbstract
Research in machine learning has questioned whether increases in training token counts reliably produce proportional performance gains in large language models. Building on prior work introducing an energy-aware parameter efficiency metric, this study empirically examines the effects of increasing training token counts under fixed hardware and training conditions. The significance of this work lies in the explicit integration of power consumption and execution duration, as reflected by the power sampling frequency, into token-scale analysis. This addresses a gap in prior studies emphasizing performance outcomes while underrepresenting computational and energy costs. Using a repeated-measures experimental design on a constant GPU instance with an identical model architecture, optimizer settings, and epoch counts, a 1.1-billion-parameter TinyLlama model was trained at three token counts (500K, 1M, and 2M). While conventional performance metrics exhibited inconsistent or diminishing returns across token scales, the inclusion of power consumption and execution duration revealed a strictly monotonic decline in training efficiency as token count increased. Repeated-measures ANOVA demonstrated a strong effect of token count on parameter efficiency (F(2, 98) = 24,268.23, p < .001, ฮทgยฒ = .997), with all pairwise comparisons remaining significant following the Bonferroni correction. These findings indicate that increases in training token counts may be energetically inefficient even when marginal performance improvements are observed, underscoring the importance of efficiency-aware evaluation in large language model training.
This is How We Eat
Forthcoming 2026.
Final manuscript in preparation. Publication link coming soon.
A Quantitative Experimental Study on Exploring Parameter Efficiency with Varying Token Counts in TinyLlama Training
Doctoral dissertation, 2025. Published in ProQuest.
Abstract
The rapid growth of large language models has escalated compute costs and environmental impacts, driven by training methods that scale datasets linearly with model size. This produces diminishing returns and limits accessibility for smaller organizations and independent researchers. The problem addressed in this study was that linearly scaled datasets restrict scalability and efficiency across the research ecosystem. The purpose of this quantitative experimental study was to examine whether training token count affects parameter efficiency when model size remains fixed. Grounded in scaling laws and compute-optimal refinements, the study investigated token-to-parameter ratios as a factor shaping training outcomes. A repeated measures design was employed using TinyLlama with 1.1 billion parameters trained under three token count conditions of 500,000, 1,000,000, and 2,000,000. Training was conducted on an Amazon Web Services SageMaker notebook instance configured to simulate a low-power edge device. Data was analyzed using repeated measures analysis of variance with Bonferroni corrections. Results showed a statistically significant effect of token count on parameter efficiency, F(2, 98) = 77.3166, p < .001, ฮทยฒ = .5268. The 2,000,000-token condition differed significantly from the 500,000 and 1,000,000 conditions, though its mean parameter efficiency was lower. Conclusions suggest refining token scaling strategies may reduce compute demands and expand research access. Future studies should explore additional token intervals and energy efficiency metrics to strengthen sustainable training practices.
Keywords: large language models, parameter efficiency, compute overhead, token scaling, repeated measures ANOVA
Technology and stress: A qualitative case study with Gen Z workers
Published 2023. Over 800 reads on ResearchGate.
Read online