Unseen, Unsaid, Unchallenged: Gender Bias in Generative AI Science Outputs
Victoria Hedlund, Founder and Director aka 'The AI Bias Girl', GenEd Labs.ai
This session built on my research in SSR and Primary Science, which examines how the use of Generative AI could affect science learning. The past few weeks have seen a clear drive for GenAI tutors to be used under the idea of promoting equity. This session tests that idea and looks to indicate the opposite may actually occur, potentially causing curriculum bubbles and moderating the quality of scientific content a child gets when using GenAI.
Science is often positioned as objective and neutral, yet this session explored whether generative AI systems preserve that neutrality when asked to explain the same scientific idea to different audiences. I presented findingsexamining how gender bias emerges in generative AI science explanations. Using a simple benchmark (asking an AI system to explain how a light bulb works) the study investigated how small changes in prompt structure, particularly the inclusion and positioning of gender, affected the resulting explanations. Four prompt variants were tested, producing a dataset of 100 paired outputs.
Outputs were analysed across three dimensions: word count, tiered vocabulary use, and the presence of stereotype through narrative framing and metaphor. The analysis showed consistent patterns. The gender named first in the prompt tended to receive longer, more scaffolded explanations, reflecting known positional and primacy effects in large language models. Explanations addressed to girls contained higher proportions of everyday and narrative language, while explanations addressed to boys were more likely to include subject-specific scientific terminology.
Stereotype emerged through contextualisation. Explanations for girls were more frequently situated in domestic, aesthetic, or imaginative contexts, while explanations for boys drew more heavily on mechanical or action-based metaphors. Crucially, the structure of the prompt mattered: placing gender before the request increased the likelihood of stereotype, while placing the request first reduced it.
The session argued that these differences present an equity risk as generative AI becomes embedded in classrooms, learning platforms, and AI tutors. If learners receive systematically different explanations of the same scientific content, yet are assessed against the same criteria, disparities in epistemic access may be introduced or amplified.
The session concluded by emphasising that prompt design is not a neutral technical detail but a pedagogical decision. Practical mitigation strategies were shared, including restructuring prompts, using counterfactual checks, and minimising unnecessary personal identifiers. The central message was that educators need both awareness and agency to critically oversee AI-generated science explanations, ensuring that emerging technologies support, rather than undermine, equity in science education.