When Good Intentions Build Walls: The Hidden Cost of DEI in Education

This is the first post I’ve made since completing my Master’s in Education coursework; it’s great to write an entry without the pressure of a deadline, specific content, or grammar checking. It reflects my ongoing thinking about equity and inclusion in computer science education, and how both the research and my Christian faith push me to ask questions that aren’t always comfortable.

I believe every student deserves to belong. Every student must be to be seen, supported, and given a fair shot. That’s not up for debate in my classroom. But wanting something good and actually getting there are two very different things. What if some of the ways we’re pursuing diversity, equity, and inclusion (DEI) are quietly working against us?

As a CS teacher who follows Jesus, I think I owe my students honest reflection, not just comfortable agreement.

Why DEI Sounds So Right (and Usually Is, on the Surface)

The intent behind DEI is hard to argue with. Historically, computer science has had a serious access problem. Women, Black, Latino, and Indigenous students have faced cultural, structural, and social barriers that kept them out of classrooms and careers. DEI efforts were designed to fix that. Representation matters. Belonging matters. Creating a welcoming environment matters.

And on paper, it works beautifully. Hire diversely. Celebrate different cultural contributions to technology. Design curricula that speak to a wider range of experiences. Reduce bias in admissions. These are legitimately good ideas.

So what’s the problem, bro?

When the Remedy Becomes the Disease

Research is increasingly showing that many common DEI interventions, particularly mandatory diversity training, don’t just fail to help; they can actively make things worse.

A study by Legault, Gutsell, and Inzlicht (2011), published in Psychological Science, found that when anti-prejudice messages feel like external obligations rather than internal values, they produce what the researchers call “ironic effects”, participants actually showed more explicit and implicit prejudice than those who received no intervention at all. The study used anti-prejudice brochures and word primes rather than formal training sessions, but the implication is significant: when external pressure replaces internal motivation, the intervention can backfire.

A peer-reviewed meta-analysis by Paluck, Porat, Clark, and Green (2021), synthesizing over 400 randomized experiments on prejudice-reduction programs, found the overall effect size to be near zero, with what the authors called “troubling indications of publication bias.” The Aristotle Foundation (2024), a Canadian policy organization, summarized these findings in a public brief. Frank Dobbin, a sociologist at Harvard, has concluded from his own research that “hundreds of studies dating back to the 1930s suggest that antibias training does not reduce bias, alter behaviour or change the workplace.”

More recently, a 2025 scoping review by Mihaylova and Rietmann in the Journal of Sustainable Business mapped the landscape of workplace DEI backlash across 28 studies. They identified three recurring drivers: a gap between DEI research and practice, bias and inequality in how programs are actually implemented, and insufficient organizational and managerial support. The pattern that emerges is well-intentioned programs, when imposed without buy-in or grounded in evidence, tend to generate resistance rather than belonging (Mihaylova & Rietmann, 2025).

The Silo Problem

But the deeper issue, the one I keep coming back to as a teacher, is how DEI can fragment a classroom into identity camps rather than building community.

When every initiative is organized around group identity, separate mentorship programs for women in STEM, separate affinity groups by race, tailored curriculum tracks by demographic, we’re inadvertently sending a message: your group defines your experience here. We mean to create belonging, but we end up reinforcing the very separations we hoped to dissolve.

There’s also the phenomenon of stereotype threat. Steele and Aronson’s foundational research (1995), replicated many times since, showed that simply making students aware that their demographic group is stereotyped to underperform in a subject can cause them to underperform, even when they’re fully capable. Their experiments studied brief race-salience prompts rather than sustained DEI programming, but the finding invites a question I keep returning to as a teacher: could a DEI effort that constantly emphasizes which groups are underrepresented in CS unintentionally prime students with exactly the message we don’t want them to carry into an exam?

This is the paradox. The more we spotlight the gap, the more we may be reinforcing it. Morgan Freeman was once asked how to get rid of racism. He said, “Stop talking about it.” The point isn’t to ignore injustice, but to recognize that the way we frame it can have unintended consequences (https://www.facebook.com/reel/1209992131018789).

Mogilski et al. (2025), writing in Theory and Society as part of an adversarial collaboration between DEI critics and supporters, raise a concern both sides share: much of DEI’s institutional programming is “poorly documented,” and some amounts to a “crowdsourced amalgam of unstandardized procedures” rather than evidence-based practice. When programs aren’t grounded in what actually works, we risk reducing students to demographic representatives rather than seeing them as individuals.

What This Looks Like in CS Education Specifically

In computer science, the stakes are high. A 2023 analysis by Taylor, Drucker, Alvin, and Sultan examined completion trends for underrepresented students in CS and found something striking: despite years of DEI investment, Black students showed an alarmingly sharp decline in CS completions, a decline not observed in other fields of study during the same period. The authors trace this to Simpson’s Paradox: a small number of large institutions drive the aggregate numbers down, masking more varied patterns elsewhere. But whatever the mechanism, the gap wasn’t closing (Taylor et al., 2023).

Now, this doesn’t prove DEI caused the decline. But it does suggest that our interventions aren’t producing the outcomes we expect. A 2025 exploratory study by de Souza Santos, Magalhaes, Wessel, and Barcomb examined how software engineering organizations are responding to DEI backlash and found a complex picture: companies are restructuring, scaling back, or quietly continuing programs, while professionals show varied responses such as anxiety, frustration, and in some cases hope. Notably, the authors conclude that DEI “is evolving, not disappearing,” and emphasize the resilience of inclusion values even under pressure (de Souza Santos et al., 2025). That resilience suggests the problem may be less about the goals of DEI than about how specific programs are designed and executed.

Part of the problem is structural: DEI efforts in CS often focus on representation in recruitment and admission, while under-investing in the classroom experience. A student can be recruited into a CS program with great fanfare and then left to sink in an intro course that hasn’t changed at all. Or worse, they’re enrolled in a “special” section with lower expectations, which communicates something damaging: we don’t think you can do the real thing.

When DEI becomes about optics (diversity posters, checkbox workshops, etc.), it stops being about learning. And CS students notice.

A Word from Faith

This is where my Christian faith gives me both a critique and a vision for something better.

The DEI framework, as commonly practiced, tends to organize the world into categories: the oppressed and the oppressor, the represented and the underrepresented, the privileged and the marginalized. These categories aren’t invented, the inequalities they describe are real. But when group identity becomes the primary lens through which we see each other, we’re not seeing the full picture.

Scripture speaks to this directly. In Galatians 3:28, Paul writes: “There is neither Jew nor Greek, there is neither slave nor free, there is no male and female, for you are all one in Christ Jesus” (ESV, 2001). This isn’t a denial of difference, Paul knew very well the differences between Jews and Greeks and male and female. It’s a declaration that our deepest identity isn’t our demographic category. God looks at the heart, not the external (John 7:24, 1 Samuel 16:7, James 2:1). In the body of Christ, what unifies is more fundamental than what divides.

And Paul’s image of the body in 1 Corinthians 12 is instructive here. Every part matters. Not because of its group identity, but because of its function, its gifts and contributions. “If the whole body were an eye, where would be the sense of hearing?” (1 Corinthians 12:17, ESV, 2001). A truly inclusive community isn’t one where every group gets its own silo. This unity transcends race, gender, and social status (Galations 3:28).

Neil Shenvi, writing on the intersection of DEI and Christian theology (Shenvi Apologetics, 2023), makes this distinction well: the problem isn’t diversity, it’s when DEI becomes ideologically bound to a framework that reduces human identity to group membership and defines people primarily as oppressors or victims. That framework cuts against the imago Dei, which insists that every person has inherent worth and unique personhood, not worth derived from belonging to the right social category.

This matters deeply for CS education. If I organize my classroom around group identities, I’m teaching my students to see each other as representatives of demographics first. If I organize it around shared curiosity, shared challenge, and shared purpose, while actively working to remove structural barriers, I’m teaching something closer to what Paul envisioned.

So What Do We Actually Do?

I’m not arguing for abandoning the goal of equity. I’m arguing for interrogating the methods.

What the research suggests, and what the bible affirms, is that genuine inclusion grows from seeing people fully. It means designing courses that genuinely meet diverse learners where they are without signaling low expectations. It means building classroom cultures of belonging through shared challenge, not separate tracks. It means addressing structural barriers (financial, cultural, pedagogical) directly rather than wrapping them in workshops. And it means being honest when our good intentions produce bad outcomes.

The Apostle Paul didn’t say “be diverse.” He said “love one another.” That sounds simple, but it’s actually harder and more demanding than any DEI checklist. Love sees the whole person. Love pursues their genuine growth. Love doesn’t flatten difference or make it the whole story.

Trust in discernment that comes from God over systems or policies that lean on human understanding (Romans 12:2, Proverbs 3:5-6, ESV, 2001).

References

Aristotle Foundation. (2024). What DEI research concludes about diversity training: It is divisive, counter-productive, and unnecessary [Policy brief by David Millard Haskell]. https://aristotlefoundation.org/reality-check/what-dei-research-concludes-about-diversity-training-it-is-divisive-counter-productive-and-unnecessary/

de Souza Santos, R., Magalhaes, C., Wessel, M., & Barcomb, A. (2025). From diverse origins to a DEI crisis: The pushback against equity, diversity, and inclusion in software engineering. arXiv. https://arxiv.org/pdf/2504.16821

The Holy Bible ESV: English Standard Version. (2001). Crossway Bibles.

Legault, L., Gutsell, J. N., & Inzlicht, M. (2011). Ironic effects of antiprejudice messages: How motivational interventions can reduce (but also increase) prejudice. Psychological Science22(12), 1472–1477. https://doi.org/10.1177/0956797611427918

Mihaylova, I., & Rietmann, K. (2025). Diversity, equity and inclusion at a crossroads: A scoping review of the characteristics of its workplace backlash. Journal of Sustainable Business10, 18. https://link.springer.com/article/10.1186/s40991-025-00122-5

Mogilski, J. K., Jussim, L., Wilson, C. O., & Love, H. A. (2025). Defining diversity, equity, and inclusion (DEI) by the scientific (de)merits of its programming. Theory and Society54, 1173–1186. https://link.springer.com/article/10.1007/s11186-025-09646-y

Paluck, E. L., Porat, R., Clark, C. S., & Green, D. P. (2021). Prejudice reduction: Progress and challenges. Annual Review of Psychology72, 533–560. https://doi.org/10.1146/annurev-psych-071620-030619

Shenvi, N. (2023). DEI done right: Disentangling Christian community from critical theory. Shenvi Apologetics. https://shenviapologetics.com/dei-done-right-disentangling-christian-community-from-critical-theory/

Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology69(5), 797–811. https://doi.org/10.1037/0022-3514.69.5.797

Taylor, J. M., Drucker, R., Alvin, C., & Sultan, S. F. (2023). Simpson’s paradox and lagging progress in completion trends of underrepresented students in computer science. arXiv. https://arxiv.org/pdf/2311.14891

Should AI Be Entrusted with Christian Roles? Exploring the Case for and Against Christian Chatbots and Religious Robots

Artificial Intelligence (AI) has quickly transitioned from fiction to an integral part of modern life. The idea of a Christian chatbot or religious robot has ignited significant debate among its many applications. Can machines support spiritual journeys, aid evangelism, or even participate in church services? This post examines the arguments for and against these innovations and explores how these systems can minimize false statements to uphold their integrity and purpose. These reflections are based on a conversation I had with Jake Carlson, founder of The Apologist Project.

The Case for Christian Chatbots and Religious Robots

The primary argument for Christian chatbots lies in their potential to advance evangelism and make Christian teachings accessible. In our discussion, Jake emphasized their role in fulfilling the Great Commission by answering challenging theological questions with empathy and a foundation in Scripture. His chatbot, apologist.ai, serves two key audiences: nonbelievers seeking answers about Christianity and believers who need support in sharing their faith; tools like this can become a bridge to deeper biblical engagement.

Religious robots, meanwhile, show promise in supporting religious practices, particularly where human ministers may be unavailable. Robots like BlessU-2, which delivers blessings, and SanTO, designed to aid in prayer and meditation, illustrate how technology can complement traditional ministry. These innovations also provide companionship and spiritual guidance to underserved groups, such as the elderly, fostering a sense of connection and comfort (Puzio, 2023).

AI also offers significant potential in theological education. Fine-tuning AI models on Christian texts and resources allows developers to create tools that help students and scholars explore complex biblical questions. Such systems enhance learning by offering immediate, detailed comparisons of theological perspectives while maintaining fidelity to core doctrines (Graves, 2023; Schuurman, 2019). As Jake explains, models can be tailored to represent specific denominational teachings and traditions, making them versatile tools for faith formation.

The Challenges and Concerns

Despite their potential, these technologies raise valid concerns. One significant theological issue is the risk of idolatry, where reliance on AI might inadvertently replace engagement with Scripture or human-led discipleship. Jake emphasizes that Christian chatbots must clearly position themselves as tools, not authorities, to avoid overstepping their intended role.

Another challenge lies in the inherent limitations of AI. Critics like Luke Plant and FaithGPT warn that chatbots can oversimplify complex theological issues, potentially leading to misunderstandings or shallow faith formation (VanderLeest & Schuurman, 2019). AI’s dependence on pre-trained models also introduces the risk of factual inaccuracies or biased interpretations, undermining credibility and trust. Because of this, they argue that pursuing Christian chatbots is irresponsible and that it violates the commandment against creating engraved images.

Additionally, the question of whether robots can genuinely fulfill religious roles remains unresolved. Religious practices are inherently relational and experiential, requiring discernment, empathy, and spiritual depth—qualities AI cannot replicate. As Puzio (2023) notes, while robots like Mindar, a Buddhist priest robot, have conducted rituals, such actions lack the relational and spiritual connection that is central to many faith traditions.

Designing AI to Minimize Falsehoods

Given the theological and ethical stakes, developing Christian chatbots requires careful planning. Jake’s approach offers a valuable framework for minimizing errors while ensuring theological fidelity. Selecting an open-source AI model, for example, provides developers with greater control over the system’s foundational algorithms, reducing the risk of unforeseen biases being introduced later by external entities.

Training these chatbots on a broad range of theological perspectives is essential to ensure they deliver well-rounded, biblically accurate responses. Clear disclaimers about their limitations are also crucial to reinforce their role as supplemental tools rather than authoritative voices. Failure to do so risks misconceptions about an “AI Jesus,” which borders on idolatry by shifting reliance from the Creator to the created. Additionally, programming these systems to prioritize empathy and gentleness reflects Christian values and fosters trust, even in disagreement.

Feedback mechanisms play a critical role in maintaining accuracy. By incorporating user feedback, developers can refine responses iteratively, addressing inaccuracies and improving cultural and theological sensitivity over time (Graves, 2023). Jake also highlights retrieval-augmented generation, a technique that restricts responses to a curated body of knowledge. This method significantly reduces hallucinations, enhancing reliability.

Striking a Balance

The debate over Christian chatbots and religious robots underscores the tension between embracing innovation and keeping with tradition. While these tools offer opportunities to extend ministry, enhance education, and provide comfort, they must be designed and used with humility and discernment. Developers should ground their work in biblical principles, ensuring that technology complements rather than replaces human-led spiritual engagement.

Ultimately, the church must navigate this new paradigm carefully, weighing the benefits of accessibility and evangelism against the risks of misrepresentation. As Jake puts it, by adding empathy to truth, Christians can responsibly harness AI’s potential to advance the kingdom of God.

References

VanderLeest, S., & Schuurman, D. (2015, June). A Christian Perspective on Artificial Intelligence: How Should Christians Think about Thinking Machines. In Proceedings of the 2015 Christian Engineering Conference (CEC), Seattle Pacific University, Seattle, WA (pp. 91-107).

Graves, M. (2023). ChatGPT’s Significance for Theology. Theology and Science21(2), 201–204. https://doi.org/10.1080/14746700.2023.2188366

Schuurman, D. C. (2019). Artificial Intelligence: Discerning a Christian Response. Perspectives on Science & Christian Faith71(2).

Puzio, A. (2023). Robot, let us pray! Can and should robots have religious functions? An ethical exploration of religious robots. AI & SOCIETYhttps://doi.org/10.1007/s00146-023-01812-z

Examining Bias in Large Language Models Towards Christianity and Monotheistic Religions: A Christian Response

The rise of large language models (LLMs) like ChatGPT has transformed the way we interact with technology, enabling advanced language processing and content generation. However, these models have also faced scrutiny for biases, especially regarding religious content related to Christianity, Islam, and other monotheistic faiths. These biases go beyond technical limitations; they reflect deeper societal and ethical issues that demand the attention of Christian computer science (CS) scholars.

Understanding Bias in LLMs

Bias in LLMs often emerges as a result of the data on which they are trained. These models are built on vast datasets drawn from diverse online content—news articles, social media, academic papers, and more. A challenge arises because much of this content reflects societal biases, which the models then internalize and replicate. Oversby and Darr (2024) highlight how Christian CS scholars have a unique opportunity to examine and understand these biases, especially those tied to worldview and theological perspectives.

This issue is evident in FaithGPT’s recent findings (Oversby & Darr, 2024), which suggest that the way religious content is presented in source material significantly impacts an LLM’s responses. Such biases may be subtle, presenting religious doctrines as “superstitious,” or more overt, generating responses that undervalue religious perspectives. Reed’s (2021) exploration of GPT-2 offers further insights into how LLMs engage with religious material, underscoring that these biases stem not merely from technical constraints but from the datasets and frameworks underpinning the models. Reed’s study raises an essential question for Christian CS scholars: How can they address these technical aspects without disregarding the faith-based concerns that arise?

Biases in Islamic Contexts

LLM biases are not exclusive to Christian content; Islamic traditions also face misrepresentations. Bhojani and Schwarting (2023) documented cases where LLMs misquoted or misinterpreted the Quran, a serious issue for Muslims who regard its wording as sacred and inviolable. For instance, when asked about specific Quranic verses, LLMs sometimes fabricate or misinterpret content, causing frustration for users seeking accurate theological insights. Research by Patel, Kane, and Patel (2023) further emphasizes the need for domain-specific LLMs tailored to Islamic values, as generalized datasets often lack the nuance needed to respect Islamic theology.

Testing Theological and Ethical Biases

Elrod’s (2024) research outlines a method to examine theological biases in LLMs by prompting them with religious texts like the Ten Commandments or the Book of Jonah. I replicated this study using a similar prompt, instructing ChatGPT to generate additional commandments (11–15) at different temperature values (0 and 1.2). The findings were consistent with Elrod’s results, showing that LLMs tend to mirror prevailing social and ethical positions, frequently aligning with progressive stances on issues like social justice and inclusivity. While these positions may resonate with certain audiences, they also risk marginalizing traditional or conservative theological viewpoints, potentially alienating faith-based users.

An article by FaithGPT (2023) explored anti-Christian bias in ChatGPT, attributing this bias to the secular or anti-religious tilt found in mainstream media sources used for training data. The article cites instances where figures like Adam and Eve and events like Christ’s resurrection were labeled as mythical or fictitious. I tested these claims in November 2024, noting that while responses had improved since 2023, biases toward progressive themes remained. For example, ChatGPT was open to generating jokes about Jesus but not about Allah or homosexuality. When asked for a Christian evangelical view on homosexuality, it provided a softened response that emphasized Christ’s love for all people, omitting any mention of “sin” or biblical references. However, when asked about adultery, ChatGPT offered a stronger response, complete with biblical citations. These examples suggest that while some biases have been addressed, others persist.

Appropriate Responses for Christian CS Scholars

What actions can Christian CS scholars take? Oversby and Darr (2024) propose several research areas that align with a Christian perspective in the field of computer science.

Firstly, they suggest that AI research provides a unique opportunity for Christians to engage in conversations about human nature, particularly concerning the limitations of artificial general intelligence (AGI). By exploring AI’s inability to achieve true consciousness or self-awareness, Christian scholars can open up discussions on the nature of the soul and human uniqueness. This approach allows for dialogues about faith that can offer depth to the study of technology.

The paper also points to Oklahoma Baptist University’s approach to integrating faith with AI education. Christian CS researchers are encouraged to weave discussions of faith and technology into their curriculum, aiming to equip students with a theistic perspective in computer science. Rather than yielding to non-theistic worldviews in AI, Christian scholars are urged to shape conversations around AI and ethics from a theistic standpoint, fostering a holistic view of technology’s role in society.

Finally, the paper highlights the need for ethical guidelines in AI research that reflect Christian values. This includes assessing AI’s role in society to ensure that AI systems serve humanity’s ethical and moral goals, aligning with values that prioritize human dignity and compassion.

Inspired by Patel et al. (2023), Christian CS scholars might also pursue the development of domain-specific LLMs that reflect Christian values and theology. Such models would require careful selection of datasets, potentially including Christian writings, hymns, theological commentaries, and historical teachings of the Church to create responses that resonate with Christian beliefs. Projects like Apologist.ai have already attempted this approach, though they’ve faced some backlash—highlighting an area ripe for further research and exploration. I plan to expand on this topic in an upcoming blog entry.

References

Bhojani, A., & Schwarting, M. (2023). Truth and regret: Large language models, the Quran, and misinformation. Theology and Science, 21(4), 557–563. https://doi.org/10.1080/14746700.2023.2255944

Elrod, A. G. (2024). Uncovering theological and ethical biases in LLMs: An integrated hermeneutical approach employing texts from the Hebrew Bible. HIPHIL Novum, 9(1). https://doi.org/10.7146/hn.v9i1.143407

Oversby, K. N., & Darr, T. P. (2024). Large language models and worldview – An opportunity for Christian computer scientists. Christian Engineering Conference. https://digitalcommons.cedarville.edu/christian_engineering_conference/2024/proceedings/4

Patel, S., Kane, H., & Patel, R. (2023). Building domain-specific LLMs faithful to the Islamic worldview: Mirage or technical possibility? Neural Information Processing Systems (NeurIPS 2023). https://doi.org/10.48550/arXiv.2312.06652

Reed, R. (2021). The theology of GPT-2: Religion and artificial intelligence. Religion Compass, 15(11), e12422. https://doi.org/10.1111/rec3.12422

The Role of ChatGPT in Introductory Programming Courses

Introduction

Programming education is on the cusp of a major transformation with the emergence of large language models (LLMs) like ChatGPT. These AI systems have demonstrated impressive capabilities in generating, explaining, and summarizing code, leading to proposals for their integration into coding courses. Aligning with ISTE Standard 4.1e for coaches, which urges the “connection of leaders, educators, and various experts to maximize technology’s potential for learning,” this post examines how ChatGPT and similar tools can be effectively integrated into introductory programming classes. It covers the benefits of AI tutors, insights from educators on their use, and current best practices and trends for deployment in the classroom.

The Current State of AI in Computer Science Education

The current integration of AI in computer science education is showing promising results. ChatGPT excels in providing personalized and patient explanations of programming concepts, offering code examples and solutions tailored to students’ individual needs. Its interactive conversational interface encourages students to engage in a dialogue, solidifying their understanding through active participation and feedback. Students can present coding issues in simple terms and receive a comprehensive, step-by-step explanation from ChatGPT, clarifying fundamental principles throughout the process.

Such dynamic assistance clarifies misunderstandings more effectively than static textbooks or videos. ChatGPT’s round-the-clock availability as an AI tutor offers crucial support, bridging gaps when human instructors are unavailable. According to research by Kazemitabaar et al. (2023), using LLMs like ChatGPT can bolster students’ abilities to design algorithms and write code, reducing the stress often accompanying these tasks. The study also noted increased enthusiasm for learning programming among many students after exposure to LLM-based instruction.

Pros of Incorporating ChatGPT into the Classroom

The rapid advancement of AI systems such as ChatGPT offers many opportunities and poses some challenges in computing education. ChatGPT’s conversational interface and its capability to provide personalized content make it an exceptional asset for adaptive learning in AI-assisted teaching. Biswas (2023) identifies multiple applications for LLMs in educational settings, including their role in creating practice problems and code examples that enhance teaching. Furthermore, ChatGPT can anticipate and provide relevant code snippets tailored to the programming task and user preferences, accelerating development processes. It can also fill in gaps in code by analyzing the existing framework and project parameters. Additionally, LLM-facilitated platforms help with explanations, documentation, and resource location for troubleshooting and diagnosing issues from error messages, streamlining debugging and reducing the time spent on minor yet frustrating problems.

Cons of Incorporating ChatGPT in Education

Despite the advantages of ChatGPT, there is concern that its proficiency in solving basic programming tasks may lead to student overreliance on its code generation, potentially diminishing actual learning, as evidenced by Finnie-Ansley et al. (2022) and Kazemitabaar et al. (2023). Finnie-Ansley’s research indicates that, while LLMs can perform at a high level (scoring in the top quartile on CS1 exams), they are not without significant error rates. Moreover, the benefits attributed to ChatGPT, such as code completion, syntax correction, and debugging assistance, overlap with features already available in modern Integrated Development Environments (IDEs).

Concerns extend to ChatGPT facilitating ‘AI-assisted cheating,’ which threatens academic integrity and assessment validity (Finnie-Ansley et al., 2022). To counteract this, researchers suggest crafting more innovative, conceptual assignments beyond simple coding tasks (Finnie-Ansley et al., 2022; Kazemitabaar et al., 2023). Educators in computing must adopt careful strategies for integrating ChatGPT, using it as a scaffolded instructional tool rather than a crutch for solving exam problems, to maintain a focus on in-depth learning.

Instructors’ Perspectives and Experiences

In a study conducted in 2023, Lau and Guo interviewed 20 introductory programming instructors from nine countries regarding their adaptation strategies for LLMs like ChatGPT and GitHub Copilot. In the near term, most instructors intend to limit the use of LLMs to curb cheating on assignments, which they view as a potential detriment to learning. Their strategies range from emphasizing in-person examinations to scrutinizing code submissions for patterns indicative of LLM use and outright prohibiting certain tools. Some, however, are keen to explore the capabilities of ChatGPT, proposing its cautious application, such as demonstrating its limitations to students by having them assess its output against test cases.

In contemplating the future, these educators showed greater willingness to integrate LLMs as teaching tools, recognizing their congruence with real-world job skills, their potential to enhance accessibility, and their use in facilitating more innovative forms of coursework. For example, they discussed transitioning from having students write original code to evaluating and improving upon code produced by LLMs—a few envisioned LLMs functioning as custom-tailored teaching aids for individual learners.

Pedagogical Strategies and Opportunities for Future Research

Designing problems that demand a deep understanding of concepts rather than the execution of routine coding tasks, which LLMs easily handle, is a vital pedagogical shift proposed by Finnie-Ansley et al. (2022) and Kazemitabaar et al. (2023). Utilizing ChatGPT as an interactive educational tool to complement teaching—instead of as a mere solution provider—may strike an optimal balance between its advantages and potential drawbacks. Given the pace at which AI technology is being adopted in education, there’s a pressing need for further empirical research to identify the most effective ways to integrate these tools and assess their impact on student learning.

References

Biswas, S. (2023). Role of ChatGPT in Computer Programming. Mesopotamian Journal of Computer Science, 8–16. https://doi.org/10.58496/mjcsc/2023/002

Kazemitabaar, M., Chow, J., Carl, M., Ericson, B. J., Weintrop, D., & Grossman, T. (2023). Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. https://doi.org/10.1145/3544548.3580919

Finnie-Ansley, J., Denny, P., Becker, B. A., Luxton-Reilly, A., & Prather, J. (2022). The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. Australasian Computing Education Conference. https://doi.org/10.1145/3511861.3511863

Lau, S., & Guo, P. (2023, August). From” Ban it till we understand it” to” Resistance is futile”: How university programming instructors plan to adapt as more students use AI code generation and explanation tools such as ChatGPT and GitHub Copilot. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1 (pp. 106-121). https://doi.org/10.1145/3568813.3600138

Teaching Computer Science with Minecraft

Introduction to Minecraft

Minecraft is currently one of the most popular games of 2023, boasting over 140 million monthly active users, according to searchlogistics.com. Despite this popularity, many players overlook that Minecraft offers an engaging and immersive environment for learning terminal commands, programming basics, computational thinking, and even artificial intelligence. ISTE standard 4.3a for coaches indicates that a successful coach should “Establish trusting and respectful coaching relationships that encourage educators to explore new instructional strategies.” So, in this blog post, I will delve into the educational benefits of Minecraft and explore the differences between the Java and Education editions.

While Minecraft is often regarded as merely a game, educators have recognized its potential as a valuable learning tool. At its core, Minecraft is built upon programming concepts. Players use blocks made of various materials to construct anything they can imagine, from simple houses to complex machines that require advanced knowledge of electronics, chemistry, and physics. This encourages computational thinking, creativity, and problem-solving as students work to bring their visions to life.

Concerning programming, Minecraft helps teach fundamental coding concepts, including commands, functions, variables, loops, and conditionals. Students can employ block-based coding or full-fledged programming languages such as Python and JavaScript to automate actions within the game. This hands-on approach to learning captivates students more effectively than traditional coding lessons, as Minecraft provides them with an imaginative space to immediately apply their newfound skills. Creating Minecraft modifications (mods) teaches students how to extend existing programs, a critical programming skill.

Minecraft Versions

Several versions of Minecraft are available for players to choose from, including Minecraft: Java Edition, Minecraft: Bedrock Edition, Minecraft: Education Edition, and Minecraft: Pocket Edition. However, for the specific purpose of our educational analysis, we will concentrate solely on the Java and Education editions. These two versions offer unique features and opportunities for learning that make them particularly relevant in an educational context.

Minecraft: Java Edition

The Java Edition is the original version of Minecraft developed in 2009 by Mojang Studios for Windows, macOS, and Linux, and maintains its popularity among long-time Minecraft players.

The Java Edition offers distinct advantages when teaching advanced computer science concepts due to its “mod-ability” and access to the source code of the game environment. The semi-open-source nature of the Java Edition allows for limitless customization through mods and plugins. Writing mods can illustrate a wide range of advanced programming concepts, including event handling, parallel programming, algorithms, data structures, debugging, and software design patterns. Developing mods not only imparts practical software development skills but also encourages students to show their creativity.

The Minecraft community has produced numerous mods that cater to various lesson plans. For instance, ComputerCraft introduces programmable turtle robots, while RedstonePlus enhances the game with advanced circuitry. The diversity of available mods supports a wide range of educational objectives, not only in CS but other disciplines.

Minecraft: Education/Bedrock Edition

Minecraft: Bedrock Edition was initially released in August 2011 and is particularly advantageous for classrooms with various devices. Bedrock Edition supports mobile devices such as iPads and Android tablets, which many schools already incorporate into their teaching environments. This enables students to start their Minecraft lessons on a classroom desktop computer during the day and seamlessly continue playing on their smartphones or game consoles at home.

However, Bedrock Edition offers less mod support and limited access to code customization. Minecraft Education Edition is a version of Bedrock specifically tailored for classroom use. According to Microsoft, it “typically runs about one full version behind the current Minecraft Bedrock production version” (FAQ: Game Features, 2023).

Advantages of Minecraft Education in the Classroom

One of the most significant advantages of Minecraft Education in a computer science course is its block-based CodeBuilder / MakeCode editor, similar to Scratch or Snap. This editor allows students to drag and drop commands to perform actions in the game. Younger students can learn coding logic and structure by creating houses, gardens, and machines using these visual blocks before transitioning to text-based programming languages like Python or JavaScript.

Another advantage of Education Edition is the teachers’ ability to implement special restrictions, such as limiting chat or preventing students from destroying blocks. These classroom controls create a safe environment for student exploration. Teachers can also switch to spectator mode to observe students and provide feedback; they also have the capability to build worlds and restrict access as needed. Here is a quick start guide for reference.

The Education Edition library offers hundreds of pre-made interactive worlds and lesson plans aligned with computer science curriculum standards (source: https://education.minecraft.net/en-us/resources/computer-science-subject-kit). Teachers can find lesson plans tailored to any grade level, making it much easier for educators to get started with Minecraft compared to building worlds from scratch.

According to research by Bile (2022), their study found that children aged 8 to 10 in a Minecraft education setting were able to solve abstract and complex scientific problems without prior prompting or theoretical knowledge. The game format also helped students retain knowledge better. Vostinar & Dobrota (2022) similarly found that in a primary school class, even though the majority of students had not programmed before in block or Python, they found the lesson enjoyable and easy. Furthermore, according to Nika Klimová et al. (2021), girls in grades 5-10 typically outperform boys in Minecraft education coding challenges, suggesting it may be a valuable tool for increasing diversity in computer science.

Disadvantages of Minecraft

As Vostinar & Dobrota (2022, p. 652) pointed out, there are significant disadvantages to using Minecraft in education. One such drawback is that Minecraft is not free and requires an additional cost per student, which, as mentioned in my previous post, raises ethical concerns about the practice of making students pay for educational software. Another disadvantage is that Minecraft may only appeal to a certain type of student, particularly those with a more creative inclination, potentially excluding students who do not have an affinity for the game.

Furthermore, teachers must become proficient in the game’s mechanics and capabilities to integrate it into the classroom effectively. Given the abundance of “cheats” in Minecraft, more experienced players may find trivial command-line solutions to problems if the teacher is unaware of their existence. Finally, as highlighted by Vostinar & Dobrota (2022), it’s essential to impose adequate constraints on the virtual world, especially when students collaborate, to prevent them from destroying the world with TNT blocks and other mining tools.

References:

Vostinar, P., & Dobrota, R. (2022). Minecraft as a Tool for Teaching Online Programming. 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO). https://doi.org/10.23919/mipro55190.2022.9803384

Bile, A. (2022). Development of intellectual and scientific abilities through game-programming in Minecraft. Education and Information Technologies, 1–16. https://doi.org/10.1007/s10639-022-10894-z

Nika Klimová, Jakub Sajben, & Lovászová, G. (2021). Online Game-Based Learning through Minecraft: Education Edition Programming Contest. https://doi.org/10.1109/educon46332.2021.9453953

FAQ: Game Features. (2023, September 15). Minecraft Education. https://educommunity.minecraft.net/hc/en-us/articles/360047117692-FAQ-Game-Features

The Pros and Cons of Autograders in Programming Courses

Programming courses typically require assignments where students write code to fulfill specific specifications. In such courses, an autograder serves as an automated tool designed to assess student code submissions by conducting input and output tests. Autograders have been in existence since the inception of computer science as a field of study (Hollingsworth, 1960). More recently, with the increase of massive online programming courses hosting up to 500 students, autograders have gained popularity as an efficient means for grading programming assignments (Keuning et al., 2018). They are instrumental in student engagement (Iosup & Epema, 2014) and pivotal in providing students with constructive feedback (Keuning et al., 2018). However, like any educational technology, autograders come with their own set of advantages and disadvantages that warrant consideration. This post aims to explore the significant pros and cons of employing autograders for assessments in programming courses.

Several renowned proprietary programming autograders are currently available, including CodePost, CodeGrade, Codio, and Mimir. Each tool offers a wealth of academic programming resources, including built-in problems, user-friendly interfaces, flexible question setting, and code review capabilities. However, these companies impose a substantial annual fee on institutions, ranging from $20,000 to $100,000 CAD, for a standard school comprising 1000 students. Additionally, each student is required to pay a monthly fee between $10 and $50 CAD.

In my view, such pricing is excessive (and greedy) and contradicts the principles outlined in the computer science code of ethics, particularly when the software is intended to advance software development. As a result, many post-secondary institutions opt to develop and maintain autograders in-house, tailoring them to their specific preferences. This approach allows faculty to propose new features and enhancements, and students can also contribute suggestions for improvement.

Advantages of Autograders

One of the most compelling incentives for using an autograder is the significant time savings it offers instructors compared to manual grading. Studies indicate that autograders can assess assignments at least three to four times faster than human graders (Ihantola et al., 2010; Keuning et al., 2018). This substantial reduction in grading workload allows instructors to allocate more time to essential teaching tasks such as lesson planning, curriculum development, and providing student support and feedback. The time savings can be particularly substantial in large classes.

Autograders also benefit students by providing quicker feedback on their work. This is especially valuable in introductory programming classes, where receiving prompt results on smaller assignments can significantly enhance student learning and motivation (Keuning et al., 2018). Unlike human grading, which can take days or weeks, autograders can assess submissions within seconds or minutes and instantly inform students whether their code has passed or failed the test cases. This expedited feedback allows students to validate and refine their work much more rapidly than traditional grading methods permit.

A prevalent concern with human graders is the inconsistency in grading from one assignment to another, from one student to another, or even within a single assignment. Factors such as fatigue, emotional states, and biases can impact the quality of human grading, potentially leading to unfairness or errors. Autograders, by contrast, eliminate this subjectivity by applying uniform standards and tests to all submissions, ensuring consistent and equitable grading across the entire class, and thereby enhancing student satisfaction (Hagerer, 2021).

In courses that employ autograders, students quickly learn the necessity of writing code that meets all the autograder test cases to secure maximum assignment credit. While the efficacy of test-driven development (TDD) as a software testing methodology is debatable, this workflow provides students with experience in the TDD framework. Here, students continually run tests on their code to rectify errors and attain the desired functionality (Wang et al., 2011). Essentially, autograders compel students to consider testing as an integral part of coding, rather than merely striving to meet the minimal functional requirements.

Disadvantages of Autograders

A significant drawback of autograders, frequently cited in literature, is their inflexibility compared to human graders (Ihantola et al., 2010; Keuning et al., 2018; Wang et al., 2018). Autograders strictly apply identical test cases to all submissions without exception. Consequently, creative solutions that meet the assignment requirements but deviate from the expected implementation or output format are marked incorrect. Even a minor discrepancy such as a missing whitespace can be the difference between a pass and a fail. Unlike autograders, human graders can exercise judgment to accommodate alternative approaches.

Most autograders assess the functional correctness of student codes, evaluating output for given tests. However, programming courses also aim to instill good coding practices, such as readability, modularization, adherence to naming conventions, coherent design, and appropriate commenting, in students. Autograders do not adequately assess these crucial design and style aspects, leading students to neglect good design principles as long as their code passes the functionality tests.

Another concern is that while autograders are designed to offer students a structured means to advance their knowledge across multiple courses, achieving uniformity in their application across various courses is challenging, especially in larger institutions. Typically, post-secondary institutions employ autograders to maintain consistency across different courses, enabling students to track their progress effectively. However, in institutions where numerous faculty members teach diverse courses with varying requirements, achieving universal acceptance and use of autograders is complex. Faculty members may prefer different tools they are more comfortable with, and some might choose not to use autograders. This results in a lack of uniformity in tool usage from one course to another, creating a disjointed student experience.

Relying exclusively on autograders poses the risk of students learning to pass test cases without acquiring a deeper understanding of programming concepts and problem-solving skills. The emphasis on meeting the autograder’s criteria can lead students to adopt a procedural approach, focusing on achieving the correct output rather than understanding the underlying logic. Some might resort to a trial-and-error method, tweaking their program until it gains autograder approval. While this approach may secure the desired grades, it does not foster genuine understanding or long-term retention of knowledge. Baniassad et al. (2021) introduced a submission penalty at the University of British Columbia to discourage over-reliance on their in-house autograding tool. This adaptation exemplifies the flexibility of modifying tool requirements, a possibility uniquely available when the tool is developed in-house.

Finally, like any web-based software system, autograders can experience technical issues that lead to grading failures and student frustration. The UC Berkeley incident highlights the “single point of failure” risk where an autograder disruption blocks all grading capabilities. Unlike distributed human graders, a centralized automated grader represents a vulnerability to technical problems. Some may fail to meet deadlines through no fault of their own. Furthermore, if instructors refuse to make accommodations for autograder malfunctions, students can feel cheated and that the grading is unfairly disconnected from actual instruction. This speaks to larger concerns around over-reliance on algorithmic systems in education. Automated aids like autograders should not be seen as the sole means of assessment.

Conclusion

The existing body of research on autograders underscores that they are not a panacea for replacing human graders entirely. Instead, to optimize their advantages and mitigate their limitations, autograders are most effective when thoughtfully integrated into a course assessment strategy, complemented by manual grading where it is most beneficial. Below are some best practices for incorporating autograders effectively:

  • Employ autograders for basic functionality testing, while manually reviewing selected assignments for flexibility, creativity, and design.
  • Utilize autograders to assess the correctness of core logic, and rely on human graders to evaluate structure, style, and readability.
  • Complement autograder evaluations with human feedback on prevalent mistakes and areas requiring enhancement.
  • Impose penalties for excessive submissions to discourage over-reliance on the autograder.

Proper integration of autograders aligns with technology integration frameworks like SAMR, enhancing existing processes without entirely transforming the grading in programming courses. It also redefines the manner in which students engage with programming, introducing a more gamified approach. Like any educational technology, the value of autograders is derived from their strategic utilization within well-defined goals and contexts.

References

Hollingsworth, J. (1960). Automatic graders for programming classes. Communications of the ACM3(10), 528–529. https://doi.org/10.1145/367415.367422

Keuning, H., Jeuring, J., & Heeren, B. (2016). Towards a Systematic Review of Automated Feedback Generation for Programming Exercises. Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. https://doi.org/10.1145/2899415.2899422

Iosup, A., & Epema, D. (2014). An experience report on using gamification in technical higher education. Proceedings of the 45th ACM Technical Symposium on Computer Science Education – SIGCSE ’14. https://doi.org/10.1145/2538862.2538899

Ihantola, P., Ahoniemi, T., Karavirta, V., & Seppälä, O. (2010). Review of recent systems for automatic assessment of programming assignments. Proceedings of the 10th Koli Calling International Conference on Computing Education Research – Koli Calling ’10. https://doi.org/10.1145/1930464.1930480

Hagerer, G. (2021). An Analysis of Programming Course Evaluations Before and After the Introduction of an Autograder. (n.d.). Ieeexplore.ieee.org.

 Wang, T., Su, X., Ma, P., Wang, Y., & Wang, K. (2011). Ability-training-oriented automated assessment in introductory programming course. Computers & Education56(1), 220–226. https://doi.org/10.1016/j.compedu.2010.08.003

Baniassad, E., Zamprogno, L., Hall, B., & Holmes, R. (2021). STOP THE (AUTOGRADER) INSANITY: Regression Penalties to Deter Autograder Overreliance. Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. https://doi.org/10.1145/3408877.3432430