AI tools are reshaping how we observe everything from the world around us to distant stars. In a recent study, an international research team demonstrated that deep learning and large language models (LLMs) can help astronomers classify stars with impressive accuracy and efficiency.
Image Credit: kinziramtane/Shutterstock.com
Their work, titled “Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification,” was published in Intelligent Computing, a Science Partner Journal.
At the heart of the study is the StarWhisper LightCurve series—a suite of three AI models designed to classify variable stars based on their light curves. These models were benchmarked against other leading approaches and trained using automated deep learning methods. This technique handles tasks like optimizing learning rates, batch sizes, and model complexity, reducing the need for manual fine-tuning.
The training data came from NASA’s Kepler and K2 missions, focusing on five major types of variable stars. To improve generalization, the team also included a small number of rare star types.
Results from the evaluation show consistently high accuracy across multiple model architectures. One standout was the Conv1D + BiLSTM model—a hybrid combining convolutional layers for extracting features and recurrent layers for detecting temporal patterns—which reached 94 % accuracy. The Swin Transformer model, adapted from natural language processing for this task, performed even better with 99 % accuracy overall.
More notably, the Swin Transformer correctly identified Type II Cepheid stars—an especially rare class of pulsating stars that account for just 0.02 % of the dataset—with 83 % accuracy.
While the Swin Transformer’s accuracy is impressive, it requires extra steps to convert light curve data into images. In comparison, the StarWhisper LightCurve models achieved nearly 90 % accuracy with far less manual input, reducing the need for explicit feature engineering. This makes them well-suited for scaling up data analysis and exploring multi-modal AI applications in astronomy.
The StarWhisper LightCurve series includes three specialized large language models, each tailored to a different data format:
- A time-series classification model built on Gemini 7B for processing structured light curve data as text.
- A multimodal model based on DeepSeek-VL-7B-Chat for interpreting image-based light curve formats.
- An audio model using Qwen-Audio to analyze light curves converted into sound waves.
These models are part of the broader StarWhisper project, which aims to build an astronomy-focused LLM with strong reasoning and instruction-following abilities.
Journal Reference:
Li, Y., et al. (2025) Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification. Intelligent Computing. doi.org/10.34133/icomputing.0110