Operationalising AI Ethics by Natalie Rouse
Operational Ethics and Artificial Intelligence: The Journey Towards Fair and Ethical AI Systems
Hello everyone, my name is Natalie Rouse, the general manager for Eliza, a leading data science, analytics, and strategy consulting company based in Australia and Aotearoa, New Zealand. Lately, ethics have emerged as a critical consideration in the development of artificial intelligence (AI).
Unfortunately, we don't have well-defined, standardized methods for embedding these ethical principles within our development lifecycle. Today, we wish to look at some of the practical steps we can take to ensure that we develop ethical AI systems.
Why Ethics Matter in AI Development
Recent headlines have been filled with examples of AI technology with unintended negative consequences, from discriminatory hiring models to selectively poor-performing facial recognition models, racial profiling, and deep fakes. These instances have highlighted the crucial importance of ethical considerations in the development and implementation of AI systems.
Key Principles for Ethical AI
To mitigate such issues, many individuals and organizations have started to draft and adopt a set of ethical principles. These principles often converge around areas like transparency, accountability, privacy, and security, fairness, and human-centered values such as autonomy and consent.
This, however, does not automatically bridge the gaps between organizational principles and the development lifecycle. Therefore, careful thought is necessary to integrate these considerations into deployment processes without creating excessive administrative overhead. Representation and diversity also count for much, ensuring robust and performant models for underrepresented groups.
Developing in Alignment with Organizational Values
Something worth mentioning is that ethical principles must align with your own organizational values. The principles must not merely be copied from elsewhere online. This alignment will enable you to draw better outcomes all around.
Risk Management in the Operationalization of Ethics and AI
Risk management is critical in operationalizing ethics in AI - from risk identification to mitigation and management. This process gives both visibility and a collective agreement on mitigation strategies, making understanding and managing models in production much easier.
Importance of Subpopulation Definition
Understanding the Value of Ethical Considerations During Scoping
Defining subgroups within your dataset is an essential aspect of embedding ethics. Without this, we cannot understand what differing outcomes or accuracy levels might mean for different groups of people.
The scoping phase of an AI project is vital for understanding the potential implications of any solution identified. If a particular approach is deemed to have a higher risk than another, it may influence the decision-making process.
Ethical Impact Evaluation During AI Model Development
The model development process already includes many iterations of analysis of the training dataset and model performance. Adding tasks for subpopulation definition and reviewing both representations in the input training dataset and performance against defined metrics across those subpopulations is a reasonable addition.
Reviewing Model Performance for Pre-Determined Sub-Populations
ibOnce you have a trained AI model, you need to review the performance for your predetermined subpopulations. This step is about interrogating your model and finding as many edge cases as possible to ensure the model has learned the right behavior from the right features to be robust in various situations.
Deploying AI Systems Into Production Ethically
Before we can roll out an AI system into production, we must ensure risks have been mitigated throughout the development process. Is a process for informed consent embedded? Is there a clear process for contesting outcomes? Is the level of explainability fit for purpose? These questions are vital.
Monitoring AI Systems Performance in Production
Once your AI system is operational, this doesn't mean the end of your ethical efforts. Having your ethics Key Performance Indicators (KPIs) within your existing Machine Learning Operations (MLOps) framework, alongside other performance metrics, is the ideal way forward.
Takeaways
- Create and continually improve a basic ethical framework.
- Curiosity will help understand how model performance impacts real-world outcomes.
- Monitor and make visible the performance of your model against your ethical metrics.
In conclusion, while embedding ethics in AI development is not always clear-cut or straightforward, it is undoubtedly worthwhile. Furthermore, I'm available for further discussions on this topic. Thank you, and I hope you enjoy this fantastic conference.
Video Transcription
Hello and welcome everybody to this women tech 2022 session on operational ethics and Artificial intelligence.I'm Natalie Rouse and I'm the general manager for Eliza, a leading data science Analytics and strategy consulting company based in Australia and Aotearoa, New Zealand, which is where I'm coming to you from today. I think we all agree now that ethics is a critical consideration for the development of artificial intelligence, but we don't have well-defined standardized methods for taking these principles and embedding them within our development life cycle. Today, we'll look at some of the practical steps that we can take to make sure that we're doing everything we can to develop fair and ethical A I systems from discriminatory hiring models to selectively poor performing facial recognition models and racial profiling to deep fakes.
Headlines over the past few years have been overflowing with examples of A I technology with unintended negative consequences. These examples have united people across the world in the understanding of the importance of ethical considerations when it comes to the development and implementation of A I systems.
This understanding has galvanized both individual contributors and organizations into action with many drafting and adopting a set of ethical principles. These principles have largely converged in some main areas around them like fairness, transparency, accountability, contestability, privacy and security and human centered values like autonomy and consent.
However, agreeing and establishing these principles does not automatically bridge the gap from organizational principles to the development life cycle work and careful thought is needed to embed these considerations into the development and deployment processes without adding prohibitive administrative overhead to any project.
And finally, representation and diversity really counts to ensure that models are robust and performant, even for underrepresented groups. And guess what? This is not a set and forget process, an iterative process of continuous improvement is needed to make sure that any approaches we take are constantly challenged and improved upon. I think this quote from the revered American poet and civil rights activist, Maya Angelou really perfectly represents the journey that we're on to do good in the world and not harm. Don't be afraid to have a go at creating a framework, a process and improving on it. As you go asking questions, being really curious and having a really good think at each stage of a project can make a huge difference to the outcomes as you embark on your journey to know or learn, learn better and do better. There are some key considerations to guide your footsteps. Please don't just copy and paste some ethical principles from the internet. For these to really get buy in from your team and really work within your organization. They need to be developed in alignment with your own values and principles.
This will allow your ethical principles to pull in the same direction as your organizational values for better outcomes all around risk management is a core part of the operationalization of ethics and A I, the identification mitigation and management of risk gives visibility.
It facilitates discussion and it enables collective agreement on mitigation strategies, agreeing and recording what an acceptable level of risk looks like for your organization is key, not just for the development process, but also for understanding and managing models and production.
Subpopulation definition is a critical element of embedding ethics within the modeling process. If you can't define subgroups or populations within your data set, how can you understand what different outcomes or differing levels of accuracy might mean for different groups of people?
And lastly, it's important to consider that adding additional processes alongside development, whether that be for privacy or ethics or both needs to be done in a way that's not prohibitively onerous on the development team. As a data science consultancy, we recognize the need to embed these considerations as standard into any project that we undertake. And as we have exposure to many industries and types of use, use cases are perhaps well placed to map that out some of that pathway for our clients and partners, the nature of A I or machine learning development is different from other areas of software or application development in that it is iterative and probabilistic in nature.
But many practitioners at a high level, at least have converged on the crisp ML process model which adequately describes the phases of any project at a high level and represents the level of iterative development at each stage. This model has been evolved from the cross industry process for data mining or crisp DM, which has long been the accepted wisdom for data mining projects, which may be thought of as ancestors for modern A I projects. So we have three main phases of development. The first phase involves understanding the business requirements and evaluating the data available to identify a solution approach and validate the technical feasibility of the solution. The second phase is all around the development and evaluation of a model or the third phase is about not just the deployment of a model into production, but the stewardship of that model throughout its lifetime, there are clear points during this process that key ethical considerations can be embedded during the scoping phase.
It's important to begin with an ethical impact assessment and risk level calculation to understand the potential implications of any solution that you identify. If an approach was deemed to be higher risk than a different one, then that might impact the decision on which approach to proceed with.
The model development process already includes many iterations of analysis of the training data set and the model performance, adding tasks for subpopulation definition and then review of both the representation in the input training data set and the performance against defined metrics across those subpopulations is a reasonable addition to this process.
When the model is ready for deployment, there should be a set of recommendations for the use of the model. For example, a level of aggregation below which the outcomes become outputs become less accurate or decision making processes that shouldn't be based on the outputs of this model.
Let's dive into these activities in a little more detail. The scoping phase of any A I project is critical to ensure the right problem is being solved in the right way to ensure, not just that maximum value is added to the process, but also to ensure the ethical impacts are well understood and the minimum amount of risk is introduced.
Many organizations already undertake privacy impact assessments at the outset of any data project which is a great start. But going a step further and extending API A to be an EI A or ethical impact assessment is an important step up front. This should be structured in such a way that low risk projects with little or no human or environmental impact can drop out quickly and proceed while more risky projects are subjected to adequate due diligence. It's important to consider ethical implications at this stage alongside other considerations such as Human Centered Design, because the potential impacts downstream may impact the aspects of the solution design such as the level of granularity. The data is operated on the explainability requirements on the type of model or method that you choose and what data you actually have the right to use the risk register that you should produce. At this stage will guide the rest of the project. The exploratory data analysis required to construct a suitable training data set to be used as an input to your model or A I system is a critical part of building expectations and hypotheses around what can be expected from your system. Bias is a key quantitative element of ethical A I. And we know that bias comes from unbalanced data sets and underrepresentation. But how do we define the groups or subpopulations? In order to measure representation? It turns out this is a hard question to answer.
If you have demographic data included in your data set, you might choose to use that. But things like ethnicity and even gender can be subjective or inadequate and app proxy. At best, we have identified a few approaches for classifying image data. However, this area is still in its early stages. I would welcome discussion with anyone who has thoughts on how best to do this step objectively and repeatedly. Once you've defined your subpopulations to measure representation and performance, you can select the performance metrics that can be used to measure performance during model training and also on an ongoing basis in production for example, if you're building a facial recognition system, you might care about just pure face detections or you might care about other things like recognizing a known person or estimating age.
Once you have a trained model that you're happy with, you need to review the performance for your predetermined subpopulations. It's best practice anyway, to interrogate your model slicing and dicing the performance metrics in as many different cuts and finding as many edge cases as you can to make sure the model has learned the right behavior from the right features to be robust in as many different situations.
As you can identify, this step is really just an enhancement for your existing process. It's really important at this point to think about what an acceptable tolerance is for performance. And by that, I mean, if you have a few percentage points spread in accuracy across your subpopulation groups, what does that mean? In reality if accuracy for all groups is over say 90% would those differences have any material difference in outcomes for any of those groups? What about if accuracy for one group drops below a certain level, what might the impact be of decisions being made for a group with much lower accuracy than the others? Or even if there's a tipping point where a few percentage points difference may have a bigger impact asking yourself these questions. And being really curious about the link between model performance and real-world outcomes helps you set realistic tolerances that can be used to monitor the performance of your model in production before we can deploy an A I system into production. We need to make sure that throughout the development process, we've taken steps to mitigate the risks that we identified upfront. Are we happy that a process for informed consent has been baked in? Is there a clear process for contesting outcomes?
And is the level of explainability fit for purpose? All of these decisions and details need to be captured in a living report that should inform any downstream consumer of the system outputs as well as the team monitoring and maintaining it once your A I system is in production, this does not mark the end of your ethics efforts.
As I mentioned back at the start, ethics is not a set and forget box ticked type of exercise, understanding how your model is performing over time and how the performance against your accuracy metrics might vary across your subpopulations. It's really important for monitoring the outcomes of your system embedding your ethics. Kpis within your existing ML ops framework alongside other performance metrics is the ideal way to do this. Your performance thresholds can be added as triggers or alerts for monitoring or retraining processes and you can give visibility of performance to key stakeholders in the business. Now, I know I've covered a lot in a short amount of time. Um but the key takeaways are really just to get started today do doing something is infinitely better than doing nothing. So start with a basic framework and evolve it over time building in your learnings from each iteration. Be curious and really ponder how model performance impacts real world outcomes monitor and make visible the performance of your model against your ethical metrics. So it's not necessarily a clear cut or straightforward process, but it's ultimately worthwhile nonetheless. OK. Well, thank you all for your time.
And, uh, my information is in my profile. I'd love to chat if anybody wants to chat more about this topic. Thank you very much and enjoy the, this fantastic conference.