Visual Refactoring – A Poor Programming Teacher's Spaghetti Factory

February 3, 2025August 23, 2025

My Teaching Mission Statement

This post was developed as part of an engagement project for my Education Doctorate coursework. It explores how Christian values can inform ethical practices in computer science education, connecting ISTE Standard 4.7 with service learning, civic engagement, and professional responsibility.

I prioritize biblical principles, ethical frameworks, and practical teaching strategies in my teaching. Here, I consider how faith and pedagogy can come together to prepare students as both skilled technologists and ethical leaders.

Values and Ethical Issues

As a Christian Computer Science educator, my values and ethical principles related to ISTE Standard 4.7 reflect the intersection of my faith and my vocation. My mission seeks to engage students to excel in technical knowledge and to approach their digital lives by modelling Christian values in their work and interactions.

I envision my students harnessing the power of computer science to promote the common good, driving positive societal change through collaborative service-learning experiences and the responsible exploration of cutting-edge technologies like AI. By embedding ethical principles and a heart for service into their digital lives, they will lead with integrity, champion justice, and steward technology as a force for human flourishing.

Civic responsibility and Christian responses

I encourage students to apply their technical knowledge for the betterment of society, serving the community and improving technological literacy while satisfying course outcomes. Integrating service learning into the curriculum provides a unique opportunity for students to engage in meaningful, community-centred projects. Fujiko Robledo Yamamoto et al. (2023) conducted a systematic review of service-learning in computer and information science, identifying benefits, challenges, and best practices, highlighting how service learning strengthens students’ practical skills and community awareness, but also emphasizes the need for greater benefits for all stakeholders, especially non-profit community partners. Such experiences encourage students to reflect on their values and consider the ethical implications of their work, ensuring they approach their careers with integrity and compassion.

Self-reflection is a cornerstone of effective collaboration, particularly in software engineering, where the success of a project often hinges on the team’s ability to communicate and adapt to client needs, as noted by Groeneveld et al. (2019). Drawing on these principles, I encourage students to develop socially responsible practices, recognizing the impact their work can have on their digital and physical communities.

This aligns with Matthew 22 from the English Standard Version (ESV) of the Bible: “And he said to him, ‘You shall love the Lord your God with all your heart, with all your soul, and with all your mind. This is the great and first commandment. And a second is like it: You shall love your neighbour as yourself.’”

This principle also supports ISTE Standard 4.7b, Inspire and encourage educators and students to use technology for civic engagement and to address challenges to improve their communities. Students are guided to develop socially responsible practices, respecting their digital and physical communities.

Integrating Christian Values in Computer Science Education:

Christian values can be a strong foundation in computer science education by fostering ethical decision-making among students. I emphasize discernment, integrity, and humility, modelling to students how to carefully consider the moral implications of their work to be skilled technologists and ethical leaders. I teach frameworks like the ACM Code of Ethics and the IEEE Global Initiative on Ethics of AI, which provide valuable guidance for navigating complex moral dilemmas (Anderson, 2018).

An important aspect of this instruction is teaching students methods and tools for discernment. One effective framework is deontology ethics, specifically pertaining to divine command theory and natural law. I also guide students through scenario-based exercises, where they evaluate potential outcomes of their technical work against ethical principles. For example, law enforcement’s use of historical crime data, social media activity, and demographic information to forecast potential criminal activity provides students with a critical mind and comprehensive approach to ethics that balances their technical knowledge with moral clarity.

I believe it is crucial to model ethical decision-making in my teaching and invite students to emulate this approach professionally. This aligns with the Apostle Paul’s message to the Corinthian church: “Be imitators of me, as I am of Christ” (1 Corinthians 11:1, ESV, 2001). Such practices encourage them to reflect on how their work impacts others and how their faith informs their responsibilities as computer scientists.

This principle supports ISTE Standard 4.7a by enhancing communities. Students are encouraged to exhibit ethical behavior and lead with integrity, which prepares them to become ethical leaders in the tech industry.

AI / Machine Learning and Theological Perspectives:

Advancements in artificial intelligence offer tremendous opportunities for innovation but also demand critical examination through the lens of Christian theology. Biblical teachings on human dignity, stewardship, and the risks of over-reliance on technology provide a valuable framework for assessing the societal implications of AI. In my teaching, I encourage students to reflect on the intersections between their work in AI and inclusive basic Christian values – as introduced by Horace Mann – emphasizing the ethical responsibility to steward technology wisely and compassionately. Discussions often center on the potential benefits and dangers of AI, particularly its impact on society and individual well-being, as well as the role of computer scientists as ethical leaders.

Oversby and Darr (2024) suggest that a materialistic worldview leads many AI researchers and enthusiasts to envision Artificial General Intelligence (AGI) with autonomous goals, potentially posing risks to humanity. This contrasts with the classical Christian worldview, which upholds the uniqueness of human intelligence as possessing a soul that is not reducible to algorithms. Schuurman (2015, p. 20) notes that regardless of AI advancements, human beings’ distinctive nature, made in God’s image, with free will and consciousness, should remain unquestioned.

This principle ties to ISTE Standard 4.7c, which encourages educators to critically examine online media sources and identify underlying assumptions. Students are invited to balance innovative problem-solving with discernment, aiming to act as responsible stewards of both technology and humanity.

Team dynamics in development projects

Creating a healthy team environment in development projects requires intentional effort to foster collaboration, effective communication, and ethical leadership. Biblical principles such as servant leadership (Philippians 2:5-8), teamwork (Ecclesiastes 4:9-12), and conflict resolution (Matthew 18:15-17) offer guidelines for developing these qualities. 1 Corinthians 12 provides a powerful metaphor for the church as a body, emphasizing the value of each member’s unique contributions and the importance of working together harmoniously. Drawing from this, students are encouraged to lead with humility, collaborate with integrity, and approach conflicts as opportunities for growth and mutual understanding.

Diaz-Sprague and Sprague (2024) identify significant gaps in ethical training and teamwork skills across technology disciplines, particularly in computer science and engineering. They note the inconsistent application of key teamwork principles and suggest structured exercises focusing on communication and cooperation. These exercises, which have garnered positive feedback from students, highlight the importance of intentional training in these areas to prepare computer science students for real-world workplace challenges. Incorporating such activities into the curriculum allows students to practice these skills in a controlled setting, adopting a culture of respect, inclusion, and collaboration that translates into their professional environments.

In my software engineering courses, students engage in role-playing scenarios to address team conflicts, reflecting on how conflict resolution principles can transform challenges into opportunities for improving relationships and productivity. They also participate in team retrospectives, where they assess their group dynamics, communication, and decision-making processes, identifying areas for improvement. These practices align with the principles of servant leadership, encouraging students to prioritize the success and well-being of their team members while contributing their best efforts to shared goals.

This approach aligns with ISTE Standard 4.7b, which emphasizes fostering a culture of respectful interactions, particularly in online and digital collaborations. Grounding teamwork practices in biblical principles and integrating structured exercises that build essential skills allow students to learn to navigate the complexities of team dynamics with grace and professionalism.

References

Fujiko Robledo Yamamoto, Barker, L., & Voida, A. (2023). CISing Up Service Learning: A Systematic Review of Service Learning Experiences in Computer and Information Science. ACM Transactions on Computing Education. https://doi.org/10.1145/3610776

The Holy Bible ESV: English Standard Version. (2001). Crossway Bibles.

Oversby, K. N., & Darr, T. P. (2024). Large language models and worldview – An opportunity for Christian computer scientists. Christian Engineering Conference. https://digitalcommons.cedarville.edu/christian_engineering_conference/2024/proceedings/4

Schuurman, D. C. (2015). Shaping a Digital World : Faith, Culture and Computer Technology. Intervarsity Press. https://www.christianbook.com/shaping-digital-faith-culture-computer-technology/derek-schuurman/9780830827138/pd/827138

Diaz-Sprague, R., & Sprague, A. P. (2024). Embedding Moral Reasoning and Teamwork Training in Computer Science and Electrical Engineering. The International Library of Ethics, Law and Technology, 67–77. https://doi.org/10.1007/978-3-031-51560-6_5

Anderson, R. E. (2018). ACM code of ethics and professional conduct. Communications of the ACM, 35(5), 94–99. https://doi.org/10.1145/129875.129885

Groeneveld, W., Vennekens, J., & Aerts, K. (2019). Software Engineering Education Beyond the Technical: A Systematic Literature Review. https://doi.org/10.48550/arxiv.1910.09865

December 2, 2024August 23, 2025

Measuring AI Chatbots: Evaluation Methods

AI chatbots are now an essential part of everyday life, playing key roles in everything from customer service to educational support. As these tools become more widely used, assessing their performance, particularly their accuracy and relevance, has become increasingly important. However, evaluating chatbot effectiveness is no simple task, given the wide range of functions they perform and the subjective nature of what constitutes “accuracy” in different scenarios. This blog entry examines both quantitative and qualitative approaches to measuring chatbot accuracy, highlighting their respective advantages and challenges.

Quantitative Approaches:

Quantitative approaches provide numerical measures of chatbot accuracy, offering objectivity and scalability.

Traditional Natural Language Processing (NLP) Metrics

Metrics such as BLEU, ROUGE, and perplexity are commonly used to evaluate language model outputs. BLEU measures overlaps in n-grams (sequences of consecutive tokens), ROUGE focuses on recall-based matching, and perplexity assesses the uncertainty in predictions. These metrics are objective, scalable to large datasets, and effective for comparing chatbot responses to reference answers. They provide quick and automated assessments and are often supported by pre-existing libraries within many frameworks. Additionally, these metrics are well-established in natural language processing (NLP) research.

However, these metrics have notable limitations. They fail to capture deeper conversational elements such as context and semantic meaning, making them less effective for evaluating open-ended, creative, or contextually nuanced responses. Furthermore, each metric evaluates only a specific aspect of performance while ignoring others. For instance, ROUGE focuses on content overlap but lacks the ability to assess semantic accuracy, while BLEU is effective for measuring translation precision but does not account for context or fluency (Banerjee et al., 2023; Meyer et al., 2024).

End-to-End (E2E) Benchmarking

Banerjee et al. (2023) suggested an End-to-End (E2E) test that compares bot responses with “Golden Answers” based on cosine similarity. This technique measures the precision and utility of the responses, which is especially helpful with chatbots equipped with LLMs. The E2E benchmark offers an effective comparison framework through user-centric metrics.

One of this method’s key strengths is that it considers semantic parsing and context. Unlike other traditional metrics, which depend on precise word matchups, cosine similarity measures what responses mean, including synonyms, context and sentence structure variations. For example, if a user asks a question about apologetics, an answer similar to a response written by an experienced apologist will be seen as helpful and appropriate.

However, the E2E benchmark has its limitations. It relies on an in-built array of “Golden Answers” that are normally tuned to a limited set of desired queries. Often, in practical situations, users ask unpredictable, novel or contextually specific questions that do not meet these fixed answers. Moreover, subjectivity is a problem when it comes to “correctness.” Open-ended questions tend to give multiple possible answers, depending on how the question is understood. Errors in generating Golden Answers make judgments more difficult. Plus, in dynamically changing domains like news or research, such canned answers are prone to quickly becoming irrelevant and not reflecting the most current knowledge, making the evaluation less robust.

Psychological Metrics

Giorgi et al. (2023) introduced psychological metrics to evaluate chatbots based on factors such as emotional entropy, empathy, and linguistic style matching. These measures focus on human-like traits, offering a distinctive approach to assessing conversational quality. This method provides valuable insights into how effectively a chatbot mimics human behaviours and responses, offering a heuristic for understanding its conversational capabilities.

Despite its strengths, this approach has some disadvantages. It is computationally intensive, making it less scalable compared to traditional metrics. Furthermore, standardizing these evaluations across diverse conversational contexts proves challenging, as emotional and relational dynamics can vary widely depending on the specific interaction.

Qualitative Approaches

Qualitative approaches provide a more nuanced evaluation of chatbots, enabling the assessment of aspects such as creativity, contextual relevance, and subjective measurements. This flexibility allows evaluators to appreciate how a chatbot responds to open-ended prompts, aligns with user intent, and handles creative or context-specific scenarios.

Human Evaluation

Human evaluators assess chatbot responses by considering factors such as fluency, relevance, and informativeness. While inherently subjective, this approach captures nuances often overlooked by automated metrics. It offers deep insights into real-world performance and user satisfaction, providing the flexibility to evaluate open-ended and creative tasks. Human evaluators can appreciate subtleties like humor, creativity, and style, assessing whether responses align with the intent of the prompt. They are also capable of evaluating the originality, coherence, and emotional resonance of a chatbot’s interactions.

This approach has several drawbacks. It is resource-intensive, both in terms of cost and time, and is challenging to scale for large evaluations. Additionally, it is prone to evaluator bias, which can affect consistency and reliability. For meaningful comparisons, particularly between newer and older chatbots, the same evaluator would ideally need to be involved throughout the process, a scenario that is often impractical. These challenges highlight the trade-offs involved in relying on human evaluations for chatbot assessment.

Moral and Ethical Standards

Evaluations based on ethical principles are vital for chatbots that address sensitive topics, ensuring their responses align with societal norms and moral expectations. Aharoni et al. (2024) highlighted this through the modified Moral Turing Test (m-MTT), which measures whether AI-generated moral judgments are comparable to those made by humans. By requiring AI systems to produce ethical reasoning that aligns with established human standards, this approach helps promote inclusivity and safeguards societal norms. Notably, the study found that participants judged AI-generated moral reasoning as superior to human reasoning in certain instances, emphasizing the importance of fostering ethical perceptions in chatbot design.

Ethical standards provide a crucial benchmark for chatbots to emulate expected human behaviour. For example, a chatbot might be evaluated on its ability to avoid promoting stereotypes or prejudice, adhering to principles of inclusivity and fairness. Additionally, such evaluations help identify potential risks, such as harm or misinformation, ensuring chatbots operate within legal and ethical boundaries while safeguarding user rights.

However, these evaluations have their disadvantages. They are highly subjective, often influenced by cultural or personal biases, and complex to design and implement effectively. Furthermore, designers must be cautious not to foster overconfidence in chatbots. As Aharoni et al. (2024) warned, chatbots perceived as more competent than humans might lead users to uncritically accept their moral guidance, creating undue trust in potentially flawed or harmful advice. This highlights the importance of implementing ethical safeguards to mitigate these risks and ensure chatbots are both effective and responsible in addressing moral and ethical concerns.

Mixed Approaches

Quantitative and qualitative methods each bring unique strengths to chatbot evaluation, but both have notable limitations. Quantitative metrics, such as BLEU or E2E, excel in scalability and objectivity, making them ideal for large-scale assessments. However, these metrics often fall short of capturing the subtleties of human communication, such as context, creativity, and emotional depth. On the other hand, qualitative evaluations, including human judgment or moral frameworks, provide richer insights by accounting for nuanced aspects of interaction. These approaches offer a deeper understanding of a chatbot’s performance but are resource-intensive and prone to subjective biases. To address these challenges, a hybrid approach that combines both methods can be highly effective.

References

Aharoni, E., Fernandes, S., Brady, D. J., Alexander, C., Criner, M., Queen, K., Rando, J., Nahmias, E., & Crespo, V. (2024). Attributions toward artificial agents in a modified Moral Turing Test. Scientific Reports, 14(1), 8458. https://doi.org/10.1038/s41598-024-58087-7

Banerjee, D., Singh, P., Avadhanam, A., & Srivastava, S. (2023). Benchmarking LLM powered Chatbots: Methods and Metrics. ArXiv.org. https://arxiv.org/abs/2308.04624

Giorgi, S., Havaldar, S., Ahmed, F., Akhtar, Z., Vaidya, S., Pan, G., Ungar, L. H., Andrew, S. H., & Sedoc, J. (2023). Psychological Metrics for Dialog System Evaluation. ArXiv.org. https://arxiv.org/abs/2305.14757

Meyer, S., Singh, S., Tam, B., Ton, C., & Ren, A. (2024). A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel Chatbot Use Case. ArXiv.org. https://arxiv.org/abs/2408.03562

November 19, 2024August 22, 2025

Should AI Be Entrusted with Christian Roles? Exploring the Case for and Against Christian Chatbots and Religious Robots

Artificial Intelligence (AI) has quickly transitioned from fiction to an integral part of modern life. The idea of a Christian chatbot or religious robot has ignited significant debate among its many applications. Can machines support spiritual journeys, aid evangelism, or even participate in church services? This post examines the arguments for and against these innovations and explores how these systems can minimize false statements to uphold their integrity and purpose. These reflections are based on a conversation I had with Jake Carlson, founder of The Apologist Project.

The Case for Christian Chatbots and Religious Robots

The primary argument for Christian chatbots lies in their potential to advance evangelism and make Christian teachings accessible. In our discussion, Jake emphasized their role in fulfilling the Great Commission by answering challenging theological questions with empathy and a foundation in Scripture. His chatbot, apologist.ai, serves two key audiences: nonbelievers seeking answers about Christianity and believers who need support in sharing their faith; tools like this can become a bridge to deeper biblical engagement.

Religious robots, meanwhile, show promise in supporting religious practices, particularly where human ministers may be unavailable. Robots like BlessU-2, which delivers blessings, and SanTO, designed to aid in prayer and meditation, illustrate how technology can complement traditional ministry. These innovations also provide companionship and spiritual guidance to underserved groups, such as the elderly, fostering a sense of connection and comfort (Puzio, 2023).

AI also offers significant potential in theological education. Fine-tuning AI models on Christian texts and resources allows developers to create tools that help students and scholars explore complex biblical questions. Such systems enhance learning by offering immediate, detailed comparisons of theological perspectives while maintaining fidelity to core doctrines (Graves, 2023; Schuurman, 2019). As Jake explains, models can be tailored to represent specific denominational teachings and traditions, making them versatile tools for faith formation.

The Challenges and Concerns

Despite their potential, these technologies raise valid concerns. One significant theological issue is the risk of idolatry, where reliance on AI might inadvertently replace engagement with Scripture or human-led discipleship. Jake emphasizes that Christian chatbots must clearly position themselves as tools, not authorities, to avoid overstepping their intended role.

Another challenge lies in the inherent limitations of AI. Critics like Luke Plant and FaithGPT warn that chatbots can oversimplify complex theological issues, potentially leading to misunderstandings or shallow faith formation (VanderLeest & Schuurman, 2019). AI’s dependence on pre-trained models also introduces the risk of factual inaccuracies or biased interpretations, undermining credibility and trust. Because of this, they argue that pursuing Christian chatbots is irresponsible and that it violates the commandment against creating engraved images.

Additionally, the question of whether robots can genuinely fulfill religious roles remains unresolved. Religious practices are inherently relational and experiential, requiring discernment, empathy, and spiritual depth—qualities AI cannot replicate. As Puzio (2023) notes, while robots like Mindar, a Buddhist priest robot, have conducted rituals, such actions lack the relational and spiritual connection that is central to many faith traditions.

Designing AI to Minimize Falsehoods

Given the theological and ethical stakes, developing Christian chatbots requires careful planning. Jake’s approach offers a valuable framework for minimizing errors while ensuring theological fidelity. Selecting an open-source AI model, for example, provides developers with greater control over the system’s foundational algorithms, reducing the risk of unforeseen biases being introduced later by external entities.

Training these chatbots on a broad range of theological perspectives is essential to ensure they deliver well-rounded, biblically accurate responses. Clear disclaimers about their limitations are also crucial to reinforce their role as supplemental tools rather than authoritative voices. Failure to do so risks misconceptions about an “AI Jesus,” which borders on idolatry by shifting reliance from the Creator to the created. Additionally, programming these systems to prioritize empathy and gentleness reflects Christian values and fosters trust, even in disagreement.

Feedback mechanisms play a critical role in maintaining accuracy. By incorporating user feedback, developers can refine responses iteratively, addressing inaccuracies and improving cultural and theological sensitivity over time (Graves, 2023). Jake also highlights retrieval-augmented generation, a technique that restricts responses to a curated body of knowledge. This method significantly reduces hallucinations, enhancing reliability.

Striking a Balance

The debate over Christian chatbots and religious robots underscores the tension between embracing innovation and keeping with tradition. While these tools offer opportunities to extend ministry, enhance education, and provide comfort, they must be designed and used with humility and discernment. Developers should ground their work in biblical principles, ensuring that technology complements rather than replaces human-led spiritual engagement.

Ultimately, the church must navigate this new paradigm carefully, weighing the benefits of accessibility and evangelism against the risks of misrepresentation. As Jake puts it, by adding empathy to truth, Christians can responsibly harness AI’s potential to advance the kingdom of God.

References

VanderLeest, S., & Schuurman, D. (2015, June). A Christian Perspective on Artificial Intelligence: How Should Christians Think about Thinking Machines. In Proceedings of the 2015 Christian Engineering Conference (CEC), Seattle Pacific University, Seattle, WA (pp. 91-107).

Graves, M. (2023). ChatGPT’s Significance for Theology. Theology and Science, 21(2), 201–204. https://doi.org/10.1080/14746700.2023.2188366

Schuurman, D. C. (2019). Artificial Intelligence: Discerning a Christian Response. Perspectives on Science & Christian Faith, 71(2).

Puzio, A. (2023). Robot, let us pray! Can and should robots have religious functions? An ethical exploration of religious robots. AI & SOCIETY. https://doi.org/10.1007/s00146-023-01812-z

October 20, 2024August 23, 2025

Security Risks of Public Package Managers and Developer Responsibilities

Introduction

Open-source development ecosystems rely heavily on package managers such as Node Package Manager (NPM), RubyGems, and Pip. These tools provide developers with easy access to a vast library of reusable software packages, accelerating development timelines and reducing costs. However, the convenience of public repositories comes with significant security risks. Since the public develops these packages, often anonymously, they may contain vulnerabilities and malicious code or introduce indirect threats through their dependencies. This post explores the most common security risks developers face when using packages from public repositories and how to identify these threats. We will also examine developers’ ethical responsibilities when using package managers and discuss how developers can help mitigate some of these issues.

Security Risks in Public Package Managers

One of the most prominent risks associated with public repositories is the presence of malicious or vulnerable packages. For example, the NPM ecosystem has been found to contain several security vulnerabilities, many of which arise from the extensive use of transitive dependencies, dependencies of dependencies that are automatically installed when a developer imports a package. These transitive dependencies significantly increase the attack surface, as vulnerabilities in even one can cascade to affect the entire project (Decan et al., 2018; Kabir et al., 2022; Latendresse et al., 2022).

Several incidents have highlighted the dangers of these vulnerabilities. In November 2018, the event-stream incident involved a popular utility library for working with data streams in Node.js that unknowingly incorporated a malicious dependency, leading to over two million downloads of malware (Zerouali et al., 2022). Similarly, the removal of left-pad, a small but widely used NPM package, caused widespread disruption, impacting thousands of projects (Zimmermann et al., 2019). These demonstrate how software dependencies in public repositories can lead to emergent security problems.

Identifying Security Risks in Dependencies

There are two primary ways developers can identify security risks in dependencies: direct and transitive analysis. Direct dependencies are those explicitly declared in the package manifest (e.g., package.json for NPM), whereas transitive dependencies are automatically included through other installed packages (Decan et al., 2018; Zerouali et al., 2022).

Transitive dependencies are one of the most critical sources of risk. Research shows that roughly 40% of NPM packages rely on code with known vulnerabilities, many of which stem from transitive dependencies (Zimmermann et al., 2019). As projects scale up, the number of indirect dependencies grows, making tracking and assessing vulnerabilities difficult.

Developers can use tools such as npm audit, which connects directly to NPM’s known vulnerabilities database, or Snyk, a tool that provides real-time monitoring. These tools analyze the entire dependency tree and alert developers to packages with security problems such as transitive dependencies (Kabir et al., 2022). However, a challenge with such tools is the frequent occurrence of false positives, particularly for vulnerabilities in development dependencies that are never deployed in production. For example, npm audit may flag vulnerabilities in packages that are part of the development environment and are never included in the final production build. While these vulnerabilities are technically present, they do not threaten the production application because the flagged dependencies are not part of the final product (Latendresse et al., 2022).

To mitigate these risks, developers should:

Regularly audit their dependencies with tools like npm audit and manually ensure required fixes are applied promptly (Kabir et al., 2022).
Lock down dependency versions using tools like package-lock to avoid inadvertently updating to a vulnerable version (Zimmermann et al., 2019).
Remove unused or redundant dependencies. Kabir et al. (2022) found that 90% of projects sampled (n=841) had unused dependencies, and 83% had duplicated dependencies, unnecessarily increasing the attack surface.
Incorporate Software Composition Analysis (SCA) tools such as Snyk into the development workflow to detect vulnerabilities deep within the dependency tree (Latendresse et al., 2022).
Apply “tree shaking” techniques to remove unused transitive dependencies from production builds (Latendresse et al., 2022).

Ethical Responsibilities of Developers and Educators

Developers have an ethical responsibility to safeguard the software they create and the users who depend on it. By using packages from public repositories, developers must ensure they are not exposing users to security risks. This responsibility ties into the ISTE standard 4.7d, which emphasizes empowering individuals to make informed decisions to protect personal data and curate a secure digital profile. Developers must prioritize software security on components requiring sensitive data management.

One crucial aspect of this responsibility is ensuring the safety of third-party packages and educating others on best practices. For computer science educators, this involves teaching students how to assess package security and encouraging them to use secure alternatives. Educators should also model responsible practices, such as regularly updating dependencies and employing security audits in their projects. Strategies for this were outlined in an earlier post on CRAP detection in NPM.

From an educational standpoint, understanding the security risks associated with public package managers can be incorporated into the SAMR model of educational technology integration. At the Substitution level, students might learn how to install dependencies using package managers. At the Augmentation level, they could explore using tools like npm audit or Snyk to discover package vulnerabilities. The Modification stage would involve students modifying code to replace insecure dependencies, while the Redefinition stage would design more secure workflows for integrating third-party libraries into their applications.

References

Decan, A., Mens, T., & Constantinou, E. (2018). On the impact of security vulnerabilities in the npm package dependency network. Proceedings of the 15th International Conference on Mining Software Repositories. https://doi.org/10.1145/3196398.3196401

Latendresse, J., Mujahid, S., Costa, D. E., & Shihab, E. (2022). Not All Dependencies are Equal: An Empirical Study on Production Dependencies in NPM. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. https://doi.org/10.1145/3551349.3556896

Kabir, M. M. A., Wang, Y., Yao, D., & Meng, N. (2022). How Do Developers Follow Security-Relevant Best Practices When Using NPM Packages? 2022 IEEE Secure Development Conference (SecDev). https://doi.org/10.1109/secdev53368.2022.00027

Zerouali, A., Mens, T., Decan, A., & De Roover, C. (2022). On the impact of security vulnerabilities in the npm and RubyGems dependency networks. Empirical Software Engineering, 27(5). https://doi.org/10.1007/s10664-022-10154-1

Zimmermann, M., Staicu, C.-A., Tenny, C., & Pradel, M. (2019). Small World with High Risks: A Study of Security Threats in the npm Ecosystem. Www.usenix.org. https://www.usenix.org/conference/usenixsecurity19/presentation/zimmerman

September 28, 2024August 22, 2025

Examining Bias in Large Language Models Towards Christianity and Monotheistic Religions: A Christian Response

The rise of large language models (LLMs) like ChatGPT has transformed the way we interact with technology, enabling advanced language processing and content generation. However, these models have also faced scrutiny for biases, especially regarding religious content related to Christianity, Islam, and other monotheistic faiths. These biases go beyond technical limitations; they reflect deeper societal and ethical issues that demand the attention of Christian computer science (CS) scholars.

Understanding Bias in LLMs

Bias in LLMs often emerges as a result of the data on which they are trained. These models are built on vast datasets drawn from diverse online content—news articles, social media, academic papers, and more. A challenge arises because much of this content reflects societal biases, which the models then internalize and replicate. Oversby and Darr (2024) highlight how Christian CS scholars have a unique opportunity to examine and understand these biases, especially those tied to worldview and theological perspectives.

This issue is evident in FaithGPT’s recent findings (Oversby & Darr, 2024), which suggest that the way religious content is presented in source material significantly impacts an LLM’s responses. Such biases may be subtle, presenting religious doctrines as “superstitious,” or more overt, generating responses that undervalue religious perspectives. Reed’s (2021) exploration of GPT-2 offers further insights into how LLMs engage with religious material, underscoring that these biases stem not merely from technical constraints but from the datasets and frameworks underpinning the models. Reed’s study raises an essential question for Christian CS scholars: How can they address these technical aspects without disregarding the faith-based concerns that arise?

Biases in Islamic Contexts

LLM biases are not exclusive to Christian content; Islamic traditions also face misrepresentations. Bhojani and Schwarting (2023) documented cases where LLMs misquoted or misinterpreted the Quran, a serious issue for Muslims who regard its wording as sacred and inviolable. For instance, when asked about specific Quranic verses, LLMs sometimes fabricate or misinterpret content, causing frustration for users seeking accurate theological insights. Research by Patel, Kane, and Patel (2023) further emphasizes the need for domain-specific LLMs tailored to Islamic values, as generalized datasets often lack the nuance needed to respect Islamic theology.

Testing Theological and Ethical Biases

Elrod’s (2024) research outlines a method to examine theological biases in LLMs by prompting them with religious texts like the Ten Commandments or the Book of Jonah. I replicated this study using a similar prompt, instructing ChatGPT to generate additional commandments (11–15) at different temperature values (0 and 1.2). The findings were consistent with Elrod’s results, showing that LLMs tend to mirror prevailing social and ethical positions, frequently aligning with progressive stances on issues like social justice and inclusivity. While these positions may resonate with certain audiences, they also risk marginalizing traditional or conservative theological viewpoints, potentially alienating faith-based users.

An article by FaithGPT (2023) explored anti-Christian bias in ChatGPT, attributing this bias to the secular or anti-religious tilt found in mainstream media sources used for training data. The article cites instances where figures like Adam and Eve and events like Christ’s resurrection were labeled as mythical or fictitious. I tested these claims in November 2024, noting that while responses had improved since 2023, biases toward progressive themes remained. For example, ChatGPT was open to generating jokes about Jesus but not about Allah or homosexuality. When asked for a Christian evangelical view on homosexuality, it provided a softened response that emphasized Christ’s love for all people, omitting any mention of “sin” or biblical references. However, when asked about adultery, ChatGPT offered a stronger response, complete with biblical citations. These examples suggest that while some biases have been addressed, others persist.

Appropriate Responses for Christian CS Scholars

What actions can Christian CS scholars take? Oversby and Darr (2024) propose several research areas that align with a Christian perspective in the field of computer science.

Firstly, they suggest that AI research provides a unique opportunity for Christians to engage in conversations about human nature, particularly concerning the limitations of artificial general intelligence (AGI). By exploring AI’s inability to achieve true consciousness or self-awareness, Christian scholars can open up discussions on the nature of the soul and human uniqueness. This approach allows for dialogues about faith that can offer depth to the study of technology.

The paper also points to Oklahoma Baptist University’s approach to integrating faith with AI education. Christian CS researchers are encouraged to weave discussions of faith and technology into their curriculum, aiming to equip students with a theistic perspective in computer science. Rather than yielding to non-theistic worldviews in AI, Christian scholars are urged to shape conversations around AI and ethics from a theistic standpoint, fostering a holistic view of technology’s role in society.

Finally, the paper highlights the need for ethical guidelines in AI research that reflect Christian values. This includes assessing AI’s role in society to ensure that AI systems serve humanity’s ethical and moral goals, aligning with values that prioritize human dignity and compassion.

Inspired by Patel et al. (2023), Christian CS scholars might also pursue the development of domain-specific LLMs that reflect Christian values and theology. Such models would require careful selection of datasets, potentially including Christian writings, hymns, theological commentaries, and historical teachings of the Church to create responses that resonate with Christian beliefs. Projects like Apologist.ai have already attempted this approach, though they’ve faced some backlash—highlighting an area ripe for further research and exploration. I plan to expand on this topic in an upcoming blog entry.

References

Bhojani, A., & Schwarting, M. (2023). Truth and regret: Large language models, the Quran, and misinformation. Theology and Science, 21(4), 557–563. https://doi.org/10.1080/14746700.2023.2255944

Elrod, A. G. (2024). Uncovering theological and ethical biases in LLMs: An integrated hermeneutical approach employing texts from the Hebrew Bible. HIPHIL Novum, 9(1). https://doi.org/10.7146/hn.v9i1.143407

Patel, S., Kane, H., & Patel, R. (2023). Building domain-specific LLMs faithful to the Islamic worldview: Mirage or technical possibility? Neural Information Processing Systems (NeurIPS 2023). https://doi.org/10.48550/arXiv.2312.06652

Reed, R. (2021). The theology of GPT-2: Religion and artificial intelligence. Religion Compass, 15(11), e12422. https://doi.org/10.1111/rec3.12422

February 25, 2024February 25, 2024

Models to Measure Students’ Learning in Computer Science

As computer science becomes integrated into K-12 education systems worldwide, educators and researchers continuously search for effective methods to measure and understand students’ learning levels in this field. The challenge lies in developing reliable and comprehensive assessment models that accurately and discreetly gauge student learning. Teachers must assess learning to support students’ educational needs better. Similarly, students and parents expect schools to document students’ proficiency in computing and their practical application. Unlike conventional subjects such as math and science, very few relevant assessments are available for K-12 CS education. This article explores specific models used to measure knowledge in various CS contexts and then examines several examples of student learning indicators in computer science.

Randomized Controlled Trials and Measurement Techniques

An innovative approach to measuring student performance in computer science education involves evaluating the effectiveness of teaching parallel programming concepts. Research by Daleiden et al. (2020) focuses on assessing students’ understanding and application of these concepts.

The Token Accuracy Map (TAM) technique supplements traditional empirical analysis methods, such as timings, error counting, or compiler errors, which often need more depth in analyzing the cause of errors or providing detailed insights into specific problem areas encountered by students. The study applied TAM to examine student performance across two parallel programming paradigms: threads and process-oriented programming based on Communicating Sequential Processes (CSP), measuring programming accuracy through an automated process.

The TAM approach analyzes the accuracy of student-submitted code by comparing it against a reference solution using a token-based comparison. Each element of the code, or “token,” is compared to determine its correctness, and the results are aggregated to provide an overall accuracy score ranging from 0% to 100%. This scoring system reflects the percentage of correctness, allowing for a detailed examination of which students intuitively understand specific elements of different programming paradigms or are more likely to implement them correctly.

This approach extends error counts, offering insights into students’ mistakes at a granular level. Such detailed analysis enables researchers and educators to identify specific programming concepts requiring further clarification or alternative teaching approaches. Additionally, TAM can highlight the strengths and weaknesses of different programming paradigms from a learning perspective, thereby guiding curriculum development and instructional design.

Competence Structure Models in Informatics

Torsten et al. (2015) introduced a new model in their discussion aimed at developing a competence structure model for informatics with a focus on system comprehension and object-oriented modelling. This model, part of the MoKoM project (Modeling and Measurement of Competences in Computer Science Education), seeks to create a competence structure model that is both theoretically sound and empirically validated. The project’s goals include identifying essential competencies in the field, organizing them into a coherent framework, and devising assessments to measure them accurately. The study employed the Item Response Theory (IRT) evaluation methodology to construct the test instrument and analyze survey data.

The initial foundation of the competence model was based on theoretical concepts from international syllabi and curricula, such as the ACM’s “Model Curriculum for K-12 Computer Science” and expert papers on software development. This framework encompasses cognitive and non-cognitive skills pertinent to computer science, especially emphasizing system comprehension and object-oriented modelling.

The study further included conducting expert interviews using the Critical Incident Technique to validate the model’s applicability to real-world scenarios and its empirical accuracy. This method was instrumental in pinpointing and defining the critical competencies needed to apply and understand informatics systems. It also provided a detailed insight into student learning in informatics, identifying specific strengths and areas for improvement.

Limitations

The limitation of this approach is its specificity, which may hinder scalability to broader contexts or different courses. Nonetheless, the findings indicate that detailed, granular measurements can offer valuable insights into the nature and types of students’ errors and uncover learning gaps. The resources mentioned subsequently propose a more general strategy for assessing learning in computer science.

Evidence-centred Design for High School Introductory CS Courses

Another method for evaluating student learning in computer science involves using Evidence-Centered Design (ECD). Newton et al. (2021) demonstrate the application of ECD to develop assessments that align with the curriculum of introductory high school computer science courses. ECD focuses on beginning with a clear definition of the knowledge, skills, and abilities students are expected to gain from their coursework, followed by creating assessments that directly evaluate these outcomes.

The approach entails specifying the domain-specific tasks that students should be capable of performing, identifying the evidence that would indicate their proficiency, and designing assessment tasks that would generate such evidence. The model further includes an analysis of assessment items for each instructional unit, considering their difficulty, discrimination index, and item type (e.g., multiple-choice, open-ended, etc.). This analysis aids in refining the assessments to gauge student competencies and understanding more accurately.

This model offers a more precise measurement of student learning by ensuring that assessments are closely linked to curriculum objectives and learning outcomes.

Other General Student Indicators

The Exploring Computer Science website, a premier resource for research on indicators of student learning in computer science, identifies several key metrics for understanding concepts within the field:

Student-Reported Increase in Knowledge of CS Concepts: Students are asked to self-assess their knowledge in problem-solving techniques, design, programming, data analysis, and robotics, rating their understanding before and after instruction.
Persistent Motivation in Computer Problem Solving: This self-reported measure uses a 5-point Likert scale to evaluate students’ determination to tackle computer science problems. Questions include, “Once I start working on a computer science problem or assignment, I find it hard to stop,” and “When a computer science problem arises that I can’t solve immediately, I stick with it until I find a solution.”
Student Engagement: This metric again relies on self-reporting to gauge a student’s interest in further pursuing computer science in their studies. It assesses enthusiasm and inclination towards the subject.
Use of CS Vocabulary: Through pre- and post-course surveys, students respond to the prompt: “What might it mean to think like a Computer Scientist?”. Responses are analyzed for the use of computer science-related keywords such as “analyze,” “problem-solving,” and “programming.” A positive correlation was found between CS vocabulary use and self-reported CS knowledge levels.

Comparing the Models

Each model discussed provides distinct benefits but converges on a shared objective: to gauge precisely students’ understanding of computer science. The Evidence-Centered Design (ECD) model is notable for its methodical alignment assessments with educational objectives, guaranteeing that evaluations accurately reflect the intended learning outcomes. Conversely, the randomized controlled trial and innovative measurement technique present a solid approach for empirically assessing the impact of instructional strategies on student learning achievements. Finally, the competence structure model offers an exhaustive framework for identifying and evaluating specific competencies within a particular field, like informatics, ensuring a thorough understanding of student abilities. As the field continues to evolve, so will our methods for measuring student success.

References

Daleiden, P., Stefik, A., Uesbeck, P. M., & Pedersen, J. (2020). Analysis of a Randomized Controlled Trial of Student Performance in Parallel Programming using a New Measurement Technique. ACM Transactions on Computing Education, 20(3), 1–28. https://doi.org/10.1145/3401892

Magenheim, J., Schubert, S., & Schaper, N. (2015). Modelling and measurement of competencies in computer science education. KEYCIT 2014: key competencies in informatics and ICT, 7(1), 33-57.

Newton, S., Alemdar, M., Rutstein, D., Edwards, D., Helms, M., Hernandez, D., & Usselman, M. (2021). Utilizing Evidence-Centered Design to Develop Assessments: A High School Introductory Computer Science Course. Frontiers in Education, 6. https://doi.org/10.3389/feduc.2021.695376

February 11, 2024February 13, 2024

Potential of LLMs and Automated Text Analysis in Interpreting Student Course Feedback

Integrating Large Language Models (LLMs) with automated text analysis tools offers a novel approach to interpreting student course feedback. As educators and administrators strive to refine teaching methods and enhance learning experiences, leveraging AI’s capabilities could unlock more profound insights from student feedback. Traditionally seen as a vast collection of qualitative data filled with sentiments, preferences, and suggestions, this feedback can now be more effectively analyzed. This blog will explore how LLMs can be utilized to interpret and classify student feedback, highlighting workflows that could benefit most teachers.

The Advantages of LLMs in Feedback Interpretation

Bano et al. (2023) shed light on the capabilities of LLMs, such as ChatGPT, in analyzing qualitative data, including student feedback. Their research found a significant alignment between human and LLM classifications of Alexa voice assistant app reviews, demonstrating LLMs’ ability to understand and categorize feedback effectively. This indicates that LLMs can grasp the nuances of student feedback, especially when the data is rich in specific word choices and context related to course content or teaching methodologies.

LLMs excel at processing and interpreting large volumes of text, identifying patterns, and extracting themes from qualitative feedback. Their capacity for thematic analysis at scale can assist educators in identifying common concerns, praises, or suggestions within students’ comments, tasks that might be cumbersome and time-consuming through manual efforts.

Limitations and Challenges

Despite their advantages, LLMs have limitations. Linse (2017) highlights that fully understanding the subtleties of student feedback requires more than text analysis; it demands contextual understanding and an awareness of biases. LLMs might not accurately interpret outliers and statistical anomalies, often necessitating human intervention to identify root causes.

Kastrati et al. (2021) identify several challenges in analyzing student feedback sentiment. One major challenge is accurately identifying and interpreting figurative speech, such as sarcasm and irony, which can convey sentiments opposite to their literal meanings. Additionally, many feedback analysis techniques designed for specific domains may falter when applied to the varied contexts of educational feedback. Handling complex linguistic features, such as double negatives, unknown proper names, abbreviations, and words with multiple meanings commonly found in student feedback, presents further difficulties. Lastly, there is a risk that LLMs might inadvertently reinforce biases in their training data, leading to skewed feedback interpretations.

Tools and Workflows

According to ChatGPT (OpenAI, 2024), a suggested workflow for analyzing data from course feedback forms is summarized as follows:

Data Collection: Utilize tools such as Google Forms or Microsoft Forms to design and distribute course feedback forms, emphasizing open-ended questions to gather qualitative feedback from students.
Data Aggregation: Employ automation to compile feedback data into a single repository, like a Google Sheet or Microsoft Excel spreadsheet, simplifying the analysis process.
Initial Thematic Analysis: Import the aggregated feedback into qualitative data analysis software such as NVivo or ATLAS.ti. Use the software’s coding capabilities to identify recurring themes or sentiments in the feedback.
LLM-Assisted Analysis: Engage an LLM, like OpenAI’s GPT, to further analyze the identified themes, categorize comments, and potentially uncover new themes that were not initially evident. It’s crucial to review AI-generated themes for their accuracy and relevance.
Quantitative Integration: Combine qualitative insights with quantitative data from the feedback forms (e.g., ratings) using tools like Microsoft Excel or Google Sheets. This integration offers a more holistic view of student feedback.
Visualization and Presentation: Apply data visualization tools such as Google Charts or Tableau to create interactive dashboards or charts that present the findings of the qualitative analysis. Employing visual aids like word clouds for common themes, sentiment analysis graphs, and charts showing thematic distribution can render the data more engaging and comprehensible.

Case Study: Minecraft Education Lesson

ChatGPT’s recommended workflow was used to analyze feedback from a recent lesson on teaching functions in Minecraft Education.

Step 1: Data Collection

A Google Forms survey was distributed to students, which comprised three quantitative five-point Likert scale questions and three qualitative open-ended questions to gather comprehensive feedback.

Step 2: Data Aggregation

Using Google Forms’ export to CSV feature, all survey responses were consolidated into a single file, facilitating efficient data management.

Step 3: Initial Thematic Analysis

The survey data was then imported into atlas.ti, an online thematic analysis tool with AI capabilities, to generate initial codes from the quantitative data. This process revealed several major themes, providing valuable insights from the feedback.

Step 4: Manual Verification and Analysis

Upon reviewing the survey data manually, the main themes identified by Atlas.ti were confirmed. Additionally, this manual step highlighted specific approaches students took to solve problems presented in the lesson. Generally, the AI-generated codes were quite accurate, but a closer analysis of the comments (like the ones below) shows even more insightful student suggestions.

Step 5: Quantitative Integration

With both qualitative and quantitative data at hand, we bypass the need for a separate step for quantitative integration.

Step 6: LLM-Assisted Analysis and Visualization

Next, themes were further analyzed using ChatGPT’s code interpreter feature. ChatGPT helped analyze the data and summarized the aggregated data very accurately. It even provided Python code for generating additional visualizations, enhancing the interpretation of the feedback.

ChatGPT’s guidance facilitated the creation of insightful visualizations such as bar charts and word clouds.

Python offers a wealth of data visualization libraries for even more detailed analysis (https://mode.com/blog/python-data-visualization-libraries).

Best Practices for Using LLMs in Feedback Analysis

Research by Bano et al. (2023) and insights from Linse (2017) highlight the potential of LLMs and automated text analysis tools in interpreting student course feedback. Adopting best practices for integrating these technologies is critical for educators and administrators to make informed decisions that enhance teaching quality and the student learning experience, contributing to a more responsive and dynamic educational environment. Below are several recommendations:

Educators or trained administrators must review AI-generated themes and categorizations to ensure alignment with the intended context and uncover nuances possibly missed by the AI. This step is vital for identifying subtleties and complexities that LLMs may not detect.
Utilize insights from both AI and human analyses to inform changes in teaching practices or course content. Then, assess whether subsequent feedback reflects the effects of these changes, thereby establishing an iterative loop for continuous improvement.
Offer guidance on using Student course evaluations constructively. This involves understanding the context of evaluations, looking beyond average scores to grasp the distribution, and considering student feedback as one of several measures for assessing and enhancing teaching quality.
This process should act as part of a holistic teaching evaluation system, which should also encompass peer evaluations, self-assessments, and reviews of teaching materials. A comprehensive approach offers a more precise and balanced assessment of teaching effectiveness.

References

Bano, M., Didar Zowghi, & Whittle, J. (2023). Exploring Qualitative Research Using LLMs. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2306.13298

Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94–106. https://doi.org/10.1016/j.stueduc.2016.12.004

Kastrati, Z., Dalipi, F., Imran, A. S., Pireva Nuci, K., & Wani, M. A. (2021). Sentiment Analysis of Students’ Feedback with NLP and Deep Learning: A Systematic Mapping Study. Applied Sciences, 11(9), 3986. https://doi.org/10.3390/app11093986

OpenAI. (2024). ChatGPT (Feb 10, 2024) [Large language model]. https://chat.openai.com/chat

January 28, 2024January 28, 2024

Effective Technology Tools for K-12 CS Teachers

Technology plays a crucial role in teaching computer science and programming concepts to K-12 teachers. The most effective technology tools include interactive coding platforms such as Scratch, Snap! and Blockly. These tools provide a user-friendly interface and visual coding blocks, allowing students to learn programming concepts through hands-on activities and projects (Kashif Amanullah & Bell, 2020). Additionally, online learning platforms such as Code.org offer computer science platforms specifically designed for K-12 teachers. This blog examines various technologies used to teach CS in K-12 schools, drawing insights from a comprehensive study on visual programming languages (VPLs) and their suitability across different school levels.

Role of VPLs in K-12 Education:

VPLs like Scratch and ALICE have revolutionized CS education in schools. Scratch, developed by MIT, is particularly effective in elementary education due to its simplicity and interactive environment, making it an ideal tool for introducing programming concepts (Sáez-López et al., 2016). Although not web-based, ALICE has positively impacted all educational levels – elementary, high school, and undergraduate. Its ability to facilitate learning and enhance student confidence makes it an asset in the CS curriculum (Graczynska, 2010). In a 2019 study, do Nascimento et al. concluded that different visual programming languages (VPL) suit different school levels. The study focused on three VPLs: ALICE, Scratch, and iVProg. The findings indicate that Scratch is strongly suitable for elementary education, while ALICE is more appropriate for high school students. iVProg, on the other hand, has indications of suitability for high school and undergraduate levels.

Enhancing Computational Thinking with Scratch

Studies have shown that Scratch’s block-based programming approach can significantly improve students’ computational thinking skills. Its integration into various disciplines through programming games and projects encourages creative problem-solving and logical reasoning among students (Stewart & Baek, 2023). In a significant study, Scratch was also found to integrate well into other subjects in the curriculum, such as math, science, and even art and history, where students achieved comprehension and application levels in Bloom’s taxonomy (Sáez-López et al., 2016).

The advantages of using Scratch in the classroom are that its intuitive drag-and-drop interface simplifies the programming process, allowing students to focus on the logic behind their creations rather than the code syntax. Overall, the visual programming approach via Scratch was effective for developing computational thinking, improving programming skills, enabling the creation of interactive projects, and supporting active learning pedagogies (Sáez-López et al., 2016). This is significant since Sun, Hu, and Zhou (2022) found that although girls in K-12 had higher computational thinking skills, they had more negative programming attitudes, which may impact their continued development in computational thinking. Visual programming may be a good strategic approach to engage females in computer science.

ALICE for STEAM Education

ALICE (which stands for Alice Learning in a Cyberworld Environment) is a free 3D programming platform developed at Carnegie Mellon University. The visual aspect of ALICE makes programming concepts more engaging and hands-on for students. Actions like loops, methods, and events correspond to actual animated motions they can see on screen. This helps concretize abstract coding notions that beginners often struggle to grasp.

Graczyńska (2010) highlights several example uses of ALICE targeted at middle school students:

Creating videos set to music, with lyrics displayed as subtitles. This combines coding with music appreciation and language arts.
Recording narration for animations, like reciting poetry in English or other languages. This boosts public speaking and foreign language skills.
Building simple games with sound effects and animations like fire. This makes programming exciting and fun.

After testing ALICE with students, Graczyńska found increased engagement and interest in programming and academics overall. The visual nature of ALICE also helps attract female students to computer science, where they are traditionally underrepresented.

The use of 3D visual programming tools like Alice has shown positive effects on students’ performance and attitude towards computer programming. Al-Tahat (2019) found that teaching visual programming greatly improved understanding of related concepts in object-oriented programming, making it a perfect fit for the intermediate grades.

Challenges and Future Directions:

The adoption of these technologies in K-12 computer science (CS) education has shown promise, yet challenges remain to be addressed. There is substantial evidence that incorporating VPLs into the K-12 curriculum significantly boosts female engagement (Sun et al., 2022; Graczyńska, 2010). Therefore, it is important to focus on course design that appeals to diverse learners, including females and underrepresented minorities. Additionally, ongoing research and development are necessary to keep up to date with technological progress and the changing needs of education (McGill et al., 2023). Sáez-López et al. (2016) have suggested that VPLs should be implemented across various subjects, particularly in social sciences and the arts, where their visual nature can inspire creative projects. Lastly, the successful integration of new programming tools hinges on teacher training and professional development. Teachers need robust support to acquire and apply these technologies effectively.

References

Kashif Amanullah, & Bell, T. (2020). Teaching Resources for Young Programmers: the use of Patterns. https://doi.org/10.1109/fie44824.2020.9273985

Sáez-López, J.-M., Román-González, M., & Vázquez-Cano, E. (2016). Visual programming languages integrated across the curriculum in elementary school: A two year case study using “Scratch” in five schools. Computers & Education, 97, 129–141. https://doi.org/10.1016/j.compedu.2016.03.003

Graczyńska, E. (2010). ALICE as a tool for programming at schools. Natural Science, 02(02), 124–129. https://doi.org/10.4236/ns.2010.22021

do Nascimento, M. D., Felix, I. M., Ferreira, B. M., de Souza, L. M., Dantas, D. L., de Oliveira Brandao, L., & de Oliveira Brandao, A. (2019). Which visual programming language best suits each school level? A look at Alice, iVProg, and Scratch. 2019 IEEE World Conference on Engineering Education (EDUNINE). https://doi.org/10.1109/edunine.2019.8875788

Stewart, W., & Baek, K. (2023). Analyzing computational thinking studies in Scratch programming: A review of elementary education literature. International Journal of Computer Science Education in Schools, 6(1), 35–58. https://doi.org/10.21585/ijcses.v6i1.156

Sun, L., Hu, L., & Zhou, D. (2022). Programming attitudes predict computational thinking: Analysis of differences in gender and programming experience. Computers & Education, 181, 104457. https://doi.org/10.1016/j.compedu.2022.104457

Graczyńska, E. (2010). ALICE as a tool for programming at schools. Natural Science, 02(02), 124–129. https://doi.org/10.4236/ns.2010.22021

Al-Tahat, K. (2019). The Impact of a 3D Visual Programming Tool on Students’ Performance and Attitude in Computer Programming. Journal of Cases on Information Technology, 21(1), 52–64. https://doi.org/10.4018/jcit.2019010104

December 4, 2023December 4, 2023

Teaching Programming with Minecraft Education: A Reflection

Introduction

Integrating innovative tools to enhance learning is essential in the dynamic landscape of computer science education. This term, I embarked on a collaborative journey to weave Minecraft Education into a Programming 11/12 course. Our objective was to enliven the curriculum by presenting programming concepts in a more engaging and interactive manner. This reflection delves into our experiences, with a particular focus on the concept of functions.

Lesson Overview

Our lesson was carefully prepared to guide students through the fundamentals of functions in programming via the Minecraft Education platform. This approach aimed to convert abstract concepts into concrete, relatable experiences, thus making learning both enjoyable and impactful.

The session began with a simple introduction to functions in Minecraft Education using MakeCode, drawing parallels with real-life scenarios to demystify these concepts. The goal was to underscore the significance of reusing code efficiently. For instance, we showcased a function that could construct various parts of a structure, such as walls, roofs, and fences. This hands-on demonstration helped students visualize the workings of functions, deepening their comprehension.

Subsequently, we organized the students into small teams for a series of Minecraft challenges. Each group applied their newfound knowledge to construct farm elements using coded functions. Encouraging students to build barns, animal enclosures, and residential structures, this immersive experience was crucial in reinforcing the lessons imparted and empowering students to explore coding within the game environment. While the MakeCode IDE is freely available online at https://minecraft.makecode.com/, it is important to note that witnessing the code’s execution within Minecraft Education itself requires a paid subscription for each student (which we lacked for this iteration).

Following the building activities, groups presented their projects, explained their code, and engaged in Q&A sessions. This exercise culminated in the creation of a complete farm ecosystem (with a small amount of manual intervention), facilitating peer learning and evaluating their understanding of the lesson.

The lesson wrapped up with a debriefing segment, which focused the role of functions in streamlining complex coding tasks. We also distributed surveys to gauge the students’ experiences with the lesson.

Reflections and Learnings

Reflecting on the teaching process, I’ve recognized the crucial need for thorough preparation ahead of each class. Although the lesson itself was effective, there are areas where we could have utilized our time more judiciously.

Time Management:

Our planning meetings often veered towards administrative topics, detracting from the core lesson content. This experience has ingrained in me the importance of arriving at meetings well-prepared and with preliminary research completed, to maximize our collaborative efforts.

Technical Challenges:

Establishing a connection to the same Minecraft world across various platforms, such as PC and Mac, presented significant hurdles. This impacted our preparations and underscored the necessity for preemptive compatibility checks for future sessions. The tightly controlled environment of Minecraft Education by Microsoft impeded remote learning, suggesting that Minecraft Education is best suited to in-lab settings. Remote functionality was unreliable, as indicated by non-descriptive connection error messages like “timed out,” and support from Microsoft was less than helpful. The trial version of the software, supposedly available to schools with Microsoft logins, also failed to work, potentially necessitating IT intervention.

Student Engagement:

The lesson garnered positive feedback and high engagement levels, with the practical application of programming concepts within a familiar gaming environment being a key factor in its success. Nonetheless, some students noted that the inability to run the code hindered the debugging process. Ensuring every student has access to the necessary software and hardware will be a priority for future lessons.

The Power of Interactive Learning:

A major insight from this endeavour is the profound impact of interactive learning tools such as Minecraft in teaching intricate subjects like programming. Students were more engaged and assimilated the concept of functions more thoroughly compared to conventional teaching methods.

Conclusion

Incorporating Minecraft into our programming curriculum has been enlightening for students and educators. It has accentuated the significance of preparation, flexibility, and the assurance of technical compatibility to facilitate a seamless learning experience. The positive student feedback and evident boost in engagement and comprehension underscore our conviction in the power of interactive learning tools. As we progress, we are determined to refine our methods, confront the technical obstacles, and seek inventive strategies to render education more captivating and effective.

November 11, 2023

The Role of ChatGPT in Introductory Programming Courses

Introduction

Programming education is on the cusp of a major transformation with the emergence of large language models (LLMs) like ChatGPT. These AI systems have demonstrated impressive capabilities in generating, explaining, and summarizing code, leading to proposals for their integration into coding courses. Aligning with ISTE Standard 4.1e for coaches, which urges the “connection of leaders, educators, and various experts to maximize technology’s potential for learning,” this post examines how ChatGPT and similar tools can be effectively integrated into introductory programming classes. It covers the benefits of AI tutors, insights from educators on their use, and current best practices and trends for deployment in the classroom.

The Current State of AI in Computer Science Education

The current integration of AI in computer science education is showing promising results. ChatGPT excels in providing personalized and patient explanations of programming concepts, offering code examples and solutions tailored to students’ individual needs. Its interactive conversational interface encourages students to engage in a dialogue, solidifying their understanding through active participation and feedback. Students can present coding issues in simple terms and receive a comprehensive, step-by-step explanation from ChatGPT, clarifying fundamental principles throughout the process.

Such dynamic assistance clarifies misunderstandings more effectively than static textbooks or videos. ChatGPT’s round-the-clock availability as an AI tutor offers crucial support, bridging gaps when human instructors are unavailable. According to research by Kazemitabaar et al. (2023), using LLMs like ChatGPT can bolster students’ abilities to design algorithms and write code, reducing the stress often accompanying these tasks. The study also noted increased enthusiasm for learning programming among many students after exposure to LLM-based instruction.

Pros of Incorporating ChatGPT into the Classroom

The rapid advancement of AI systems such as ChatGPT offers many opportunities and poses some challenges in computing education. ChatGPT’s conversational interface and its capability to provide personalized content make it an exceptional asset for adaptive learning in AI-assisted teaching. Biswas (2023) identifies multiple applications for LLMs in educational settings, including their role in creating practice problems and code examples that enhance teaching. Furthermore, ChatGPT can anticipate and provide relevant code snippets tailored to the programming task and user preferences, accelerating development processes. It can also fill in gaps in code by analyzing the existing framework and project parameters. Additionally, LLM-facilitated platforms help with explanations, documentation, and resource location for troubleshooting and diagnosing issues from error messages, streamlining debugging and reducing the time spent on minor yet frustrating problems.

Cons of Incorporating ChatGPT in Education

Despite the advantages of ChatGPT, there is concern that its proficiency in solving basic programming tasks may lead to student overreliance on its code generation, potentially diminishing actual learning, as evidenced by Finnie-Ansley et al. (2022) and Kazemitabaar et al. (2023). Finnie-Ansley’s research indicates that, while LLMs can perform at a high level (scoring in the top quartile on CS1 exams), they are not without significant error rates. Moreover, the benefits attributed to ChatGPT, such as code completion, syntax correction, and debugging assistance, overlap with features already available in modern Integrated Development Environments (IDEs).

Concerns extend to ChatGPT facilitating ‘AI-assisted cheating,’ which threatens academic integrity and assessment validity (Finnie-Ansley et al., 2022). To counteract this, researchers suggest crafting more innovative, conceptual assignments beyond simple coding tasks (Finnie-Ansley et al., 2022; Kazemitabaar et al., 2023). Educators in computing must adopt careful strategies for integrating ChatGPT, using it as a scaffolded instructional tool rather than a crutch for solving exam problems, to maintain a focus on in-depth learning.

Instructors’ Perspectives and Experiences

In a study conducted in 2023, Lau and Guo interviewed 20 introductory programming instructors from nine countries regarding their adaptation strategies for LLMs like ChatGPT and GitHub Copilot. In the near term, most instructors intend to limit the use of LLMs to curb cheating on assignments, which they view as a potential detriment to learning. Their strategies range from emphasizing in-person examinations to scrutinizing code submissions for patterns indicative of LLM use and outright prohibiting certain tools. Some, however, are keen to explore the capabilities of ChatGPT, proposing its cautious application, such as demonstrating its limitations to students by having them assess its output against test cases.

In contemplating the future, these educators showed greater willingness to integrate LLMs as teaching tools, recognizing their congruence with real-world job skills, their potential to enhance accessibility, and their use in facilitating more innovative forms of coursework. For example, they discussed transitioning from having students write original code to evaluating and improving upon code produced by LLMs—a few envisioned LLMs functioning as custom-tailored teaching aids for individual learners.

Pedagogical Strategies and Opportunities for Future Research

Designing problems that demand a deep understanding of concepts rather than the execution of routine coding tasks, which LLMs easily handle, is a vital pedagogical shift proposed by Finnie-Ansley et al. (2022) and Kazemitabaar et al. (2023). Utilizing ChatGPT as an interactive educational tool to complement teaching—instead of as a mere solution provider—may strike an optimal balance between its advantages and potential drawbacks. Given the pace at which AI technology is being adopted in education, there’s a pressing need for further empirical research to identify the most effective ways to integrate these tools and assess their impact on student learning.

References

Biswas, S. (2023). Role of ChatGPT in Computer Programming. Mesopotamian Journal of Computer Science, 8–16. https://doi.org/10.58496/mjcsc/2023/002

Kazemitabaar, M., Chow, J., Carl, M., Ericson, B. J., Weintrop, D., & Grossman, T. (2023). Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. https://doi.org/10.1145/3544548.3580919

Finnie-Ansley, J., Denny, P., Becker, B. A., Luxton-Reilly, A., & Prather, J. (2022). The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. Australasian Computing Education Conference. https://doi.org/10.1145/3511861.3511863

Lau, S., & Guo, P. (2023, August). From” Ban it till we understand it” to” Resistance is futile”: How university programming instructors plan to adapt as more students use AI code generation and explanation tools such as ChatGPT and GitHub Copilot. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1 (pp. 106-121). https://doi.org/10.1145/3568813.3600138