Bias in AI and its impact on education - from images to admissions

A few days ago, I was creating some art for my parenting website GoFatherhood. Some of my older articles came to my attention, and I discovered that they’re sorely lacking in visuals. It’s a tiny microcosm of the fairly recent sea change in image creation: When I was writing those articles many years ago, I would have had to either create the art myself, hire an artist, or dig through public domain image archives like Pixabay to find images to use.

Nowadays, with generative AI tools at the ready, producing art is as easy as prompting “create a colored pencil sketch of a father walking a young child in the park.” Here’s how Adobe Firefly, my current favorite gen AI art tool, rendered the above prompt: 

Very nice, right? As generative AI software has rapidly improved, the output of these programs has gone from awkward and cringy to impressive – and sometimes, downright beautiful.

But… why are the two people Caucasian? I didn’t specify a particular race. I then tried Stable Diffusion, generally known for its more sophisticated output. It offered a couple of images, so it provided more of an opportunity for the user to embrace the diversity of our world:

This is better, but if you look closely, both of the children are again Caucasian, and in all of the generated images from both tools, all the children happen to be boys. Are these programs therefore biased?

Systemic Bias and Representation

I’m going to start out by observing that the bias being exhibited here isn’t from the AI software, but from the dataset that was used for training purposes. The images that have been scanned, disassembled, and analyzed are a skewed collection of mostly Caucasian men, women, and children. As a result, “father” is rendered as young, white, slim, and fully abled. Even the park images exhibit bias toward the sort of place one might find in a wealthy American suburb, which is certainly not the only way to interpret the word “park.”

One aspect of the systemic AI bias that’s so significant in the world of education is representation. All students want to show up in teaching materials, from textbooks to coloring pages, to worksheets and lecture slides. Ask AI to create an image of “students in a college classroom learning chemistry,” and the rendered photo will likely be all clean-shaven young white men along with an older, white, male teacher. That’s definitely not representative of most modern college classrooms anywhere in the world.

However, this is a more widespread issue than just questionable image creation. Bias shows up throughout all of generative AI because of the inherent bias of training data. Even the entirety of everything ever published on web pages or posted on social media is a non-representative sample of humanity: What about the people who don’t publish online or who can’t afford a computer, mobile phone, or internet connectivity?

Bias and College Admissions

Imagine a modern college admissions office utilizing AI to scan applications to help it get through thousands (or tens of thousands) of hopeful student candidates. Based on its training data, the software likely has subtle biases toward the characteristics and background of students who have historically been admitted to the school. One example: the perceived value of some extracurricular activities, but not others.

Meanwhile, potential students with atypical backgrounds who offer more diverse voices and experiences are invisibly filtered out. That’s problematic. Feed the accepted student data back into the dataset and this could further reinforce the biases of the AI, creating an increasingly homogeneous campus, even as the intention is to increase diversity.

Now, imagine that a startup appears, and it offers AI-powered college application help to high school seniors so that diverse and non-traditional applicants can make it through the AI filtering. The tool analyzes millions of college applications, identifying the common characteristics of accepted students at each institution. It then injects these characteristics into the updated client applications, creating a skewed and inaccurate representation of each student. But hey, it works. The clients receive more acceptance letters than they did (or would have) without the AI assistance. But at what cost?

This is exactly what’s happening with job applicants nowadays, as any hiring manager will tell you. Since hiring isn’t their primary task, busy managers lean on AI software to screen the ever-increasing applicant pool, which consists of professionals who are themselves using AI to improve their resumes for the AI filters. And it’s not working very well, particularly for companies seeking creative hires who are atypical applicants.

How Can We Identify Dataset Bias?

The issue we’re circling around is that AI inevitably perpetuates the biases, social prejudices, and historical disparities baked into its training datasets, but in extremely subtle ways. If our fictional college application startup’s dataset consists of the applications received by an Ivy League school, how representative is that of the entire population of college students across the country, let alone worldwide?

Researchers have been developing fairness algorithms and analyzing the diversity of training datasets to combat this problem, but the entire concept of fairness is inherently subjective. If I prompt my favorite gen AI image utility for an image of a “college classroom, students writing quietly in their notebooks while the teacher reads a book in the front of the class,” what type of image would you expect to be generated?

And what if I added “in Soweto,” “in Osaka,” “in Mumbai,” or “in Aukland”? What if I specified that it was an “honors” or “remedial” class?

Another excellent image generation tool is Ideogram. I tested it by requesting an image for a Mumbai classroom with the prompt shown above:

Is that a fair representation? In this instance, what does accuracy mean? You’ll notice that there’s no ethnic or ability diversity in that classroom, nor are the “Osaka” or “Aukland” images indicative of diversity in ethnicity, socioeconomics, or physical ability. Earlier I highlighted the lack of diversity in generated imagery and warned that it’s a tricky yardstick. Now we can see how it’s nuanced: Should all groups be diverse, or should they be generated based on the most likely mix of people in that scene or scenario?

When Google’s Gemini AI was first released, it was widely criticized for artificially injecting diversity into requests for historically accurate imagery. Think “politically correct” revisionist history. Gemini refused to generate images for months afterwards while this well-intentioned gaffe was fixed.

Is Bias the Reasonable Measure?

I believe that everyone is biased to some degree, so it might be unreasonable to expect that our robot overlords – uh, AI systems – will be able to act in a completely unbiased manner. Perhaps that shouldn’t be the goal. Instead, I would suggest that we, as the humans in the loop, must continue to work on being aware of how our own biases might be subtly reinforced by the AI systems we use.

The next generation deserves to live in a better, more accepting, and less prejudiced world, and it’s clear that one step on that journey will have to be the improvement of the AI systems we’re building and using in classrooms and offices. I encourage you to experiment with your favorite generative AI tool. See if you can spot if it exhibits social and cultural biases that might mirror your own.