As financial services firms scramble to keep pace with technological advancements like machine learning and artificial intelligence (AI), data governance (DG) and data management (DM) are playing an increasingly important role — a role that is often downplayed in what has become a technology arms race.
DG and DM are core components of a successful enterprise data and analytics platform. They must fit within an organization’s investment philosophy and structure. Embracing business domain knowledge, experience, and expertise empowers the firm to incorporate management of BD alongside traditional small data.
No doubt, the deployment of advanced technologies will drive greater efficiencies and secure competitive advantages through greater productivity, cost savings, and differentiated strategies and products. But no matter how sophisticated and expensive a firm’s AI tools are, it should not forget that the principle “garbage in, garbage out” (GIGO) applies to the entire investment management process.
Flawed and poor-quality input data is destined to produce faulty, useless outputs. AI models must be trained, validated, and tested with high-quality data that is extracted and purposed for training, validating, and testing.
Getting the data right often sounds less interesting or even boring for most investment professionals. Besides, practitioners typically do not think that their job description includes DG and DM.
But there is a growing recognition among industry leaders that cross-functional, T-Shaped Teams will help organizations develop investment processes that incorporate AI and big data (BD). Yet, despite increased collaboration between the investment and technology functions, the critical inputs of DG and DM are often not sufficiently robust.
The Data Science Venn Diagram
BD is the primary input of AI models. Data Science is an inter-disciplinary field comprising overlaps among math and statistics, computer science, domain knowledge, and expertise. As I wrote in a previous blog post, human teams that successfully adapt to the evolving landscape will persevere. Those that don’t are likely to render themselves obsolete.
Exhibit 1 illustrates the overlapping functions. Looking at the Venn Diagram through the lens of job functions within an investment management firm: AI professionals cover math and statistics; technology professionals tackle computer science; and investment professionals bring a depth of knowledge, experience, and expertise to the team — with the help of data professionals.
Exhibit 1.
Table 1 deals solely with BD features. Clearly, professionals with skills in one area cannot be expected to deal with this level of complexity.
Table 1. BD and Five Vs
Volume, veracity, and value are challenging due to nagging uncertainty about completeness and accuracy of data, as well as the validity of garnered insights.
To unleash the potential of BD and AI, investment professionals must understand how these concepts operate together in practice. Only then can BD and AI drive efficiency, productivity, and competitive advantage.
Enter DG and DM. They are critical for managing data protection and secured data privacy, which are areas of significant regulatory focus. That includes post global financial crisis regulatory reform, such as the Basel Committee on Banking Supervision’s standard 239(BCBS239) and the European Union’s Solvency II Directive. More recent regulatory actions include the European Central Bank’s Data Quality Dashboard, the California Consumer Privacy Act, and the EU’s General Data Protection Regulation (GDPR), which compels the industry to better manage the privacy of individuals’ personal data.
Future regulations are likely to give individuals increased ownership of their data. Firms should be working to define digital data rights and standards, particularly in how they will protect individual privacy.
Data incorporates both the raw, unprocessed inputs as well as the resulting “content.” Content is the result of analysis — often on dashboards that enable story-telling. DG models can be built based on this foundation and DG practices will not necessarily be the same across every organization. Notably, DG frameworks have yet to address how to handle BD and AI models, which exist only ephemerally and change frequently.
What Are the Key Components of Data Governance?
Alignment and Commitment: Alignment on data strategy across the enterprise, and management commitment to it is critical. Guidance from a multi-stakeholder committee within an organization is desired.
From an internal control and governance perspective, a minimum level of transparency, explainability, interpretability, auditability, traceability, and repeatability need to be ensured for a committee to be able to analyze the data, as well as the models used, and approve deployment. This function should be separate from the well-documented data research and model development process.
Security: Data security is the practice of defining, labeling, and approving data by their levels of risk and reward, and then granting secure access rights to appropriate parties concerned. In other words, putting security measures in place and protecting data from unauthorized access and data corruption. Keeping a balance between user accessibility and security is key.
Transparency: Every policy and procedure a firm adopts must be transparent and auditable. Transparency means enabling data analysts, portfolio managers, and other stakeholders to understand the source of the data and how it is processed, stored, consumed, archived, and deleted.
Compliance: Ensuring that controls are in place to comply with corporate policies and procedures as well as regulatory and legislative requirements is not enough. Ongoing monitoring is necessary. Policies should include identifying attributes of sensitive information, protecting privacy via anonymization and tokenization of data where possible, and fulfilling requirements of information retention.
Stewardship: An assigned team of data stewards should be established to monitor and control how business users tap into data. Leading by example, these stewards will ensure data quality, security, transparency, and compliance.
What Are the Key Elements of Data Management?
Preparation: This is the process of cleaning and transforming raw data to allow for data completeness and accuracy. This critical first step sometimes gets missed in the rush for analysis and reporting, and organizations find themselves making garbage decisions with garbage data.
Creating a data model that is “built to evolve constantly” is far much better than creating a data model that is “built to last long as it is.” The data model should meet today’s needs and adapt to future change.
Databases collected under heterogeneous conditions (i.e., different populations, regimes, or sampling methods) provide new opportunities for analysis that cannot be achieved through individual data sources. At the same time, the combination of such underlying heterogeneous environments gives rise to potential analytical challenges and pitfalls, including sampling selection, confounding, and cross-population biases while standardization and data aggregation make data handling and analysis straightforward, but not necessarily insightful.
Catalogs, Warehouses, and Pipelines: Data catalogs house the metadata and provide a holistic view of the data, making it easier to find and track. Data warehouses consolidate all data across catalogs, and data pipelines automatically transfer data from one system to another.
Extract, Transform, Load (ETL): ETL means transforming data into a format to load into an organization’s data warehouse. ETLs often are automated processes that are preceded by data preparation and data pipelines.
Data Architecture: This is the formal structure for managing data flow and storage.
DM follows policies and procedures defined in DG. The DM framework manages the full data lifecycle that meets organizational needs for data utilization, decision-making, and concrete actions.
Having these DG and DM frameworks in place is critical to analyze complex BD. If data should be treated as an important company asset, an organization needs to be structured and managed as such.
What is more, it is key to understand that DG and DM should work in synchronization. DG without DM and its implementation ends up being a pie in the sky. DG puts all the policies and procedures in place, and DM and its implementation enable an organization to analyze data and make decisions.
To use an analogy, DG creates and designs a blueprint for construction of a new building, and DM is the act of constructing the building. Although you can construct a small building (DM in this analogy) without a blueprint (DG), it will be less efficient, less effective, not compliant with regulations, and with a greater likelihood of a building collapse when a powerful earthquake hits.
Understanding both DG and DM will help your organization make the most of the available data and make better business decisions.
References
Larry Cao, CFA, CFA Institute (2019), AI Pioneers in Investment Management, https://www.cfainstitute.org/en/research/industry-research/ai-pioneers-in-investment-management
Larry Cao, CFA, CFA Institute (2021), T-Shaped Teams: Organizing to Adopt AI and Big Data at Investment Firms, https://www.cfainstitute.org/en/research/industry-research/t-shaped-teams
Yoshimasa Satoh, CFA, (2022), Machine Learning Algorithms and Training Methods: A Decision-Making Flowchart, https://blogs.cfainstitute.org/investor/2022/08/18/machine-learning-algorithms-and-training-methods-a-decision-making-flowchart/
Yoshimasa Satoh, CFA and Michinori Kanokogi, CFA (2023), ChatGPT and Generative AI: What They Mean for Investment Professionals, https://blogs.cfainstitute.org/investor/2023/05/09/chatgpt-and-generative-ai-what-they-mean-for-investment-professionals/
Tableau, Data Management vs. Data Governance: The Difference Explained, https://www.tableau.com/learn/articles/data-management-vs-data-governance
KPMG (2021), What is data governance — and what role should finance play? https://advisory.kpmg.us/articles/2021/finance-data-analytics-common-questions/data-governance-finance-play-role.html
Deloitte (2021), Establishing a “built to evolve” finance data strategy: Robust enterprise information and data governance models, https://www2.deloitte.com/us/en/pages/operations/articles/data-governance-model-and-finance-data-strategy.html
Deloitte (2021), Defining the finance data strategy, enterprise information model, and governance model, https://www2.deloitte.com/content/dam/Deloitte/us/Documents/process-and-operations/us-defining-the-finance-data-strategy.pdf
Ernst & Young (2020), Three priorities for financial institutions to drive a next-generation data governance framework, https://assets.ey.com/content/dam/ey-sites/ey-com/en_gl/topics/banking-and-capital-markets/ey-three-priorities-for-fis-to-drive-a-next-generation-data-governance-framework.pdf
OECD (2021), Artificial Intelligence, Machine Learning and Big Data in Finance: Opportunities, Challenges, and Implications for Policy Makers, https://www.oecd.org/finance/artificial-intelligence-machine-learning-big-data-in-finance.htm.
As financial services firms scramble to keep pace with technological advancements like machine learning and artificial intelligence (AI), data governance (DG) and data management (DM) are playing an increasingly important role — a role that is often downplayed in what has become a technology arms race.
DG and DM are core components of a successful enterprise data and analytics platform. They must fit within an organization’s investment philosophy and structure. Embracing business domain knowledge, experience, and expertise empowers the firm to incorporate management of BD alongside traditional small data.
No doubt, the deployment of advanced technologies will drive greater efficiencies and secure competitive advantages through greater productivity, cost savings, and differentiated strategies and products. But no matter how sophisticated and expensive a firm’s AI tools are, it should not forget that the principle “garbage in, garbage out” (GIGO) applies to the entire investment management process.
Flawed and poor-quality input data is destined to produce faulty, useless outputs. AI models must be trained, validated, and tested with high-quality data that is extracted and purposed for training, validating, and testing.
Getting the data right often sounds less interesting or even boring for most investment professionals. Besides, practitioners typically do not think that their job description includes DG and DM.
But there is a growing recognition among industry leaders that cross-functional, T-Shaped Teams will help organizations develop investment processes that incorporate AI and big data (BD). Yet, despite increased collaboration between the investment and technology functions, the critical inputs of DG and DM are often not sufficiently robust.
The Data Science Venn Diagram
BD is the primary input of AI models. Data Science is an inter-disciplinary field comprising overlaps among math and statistics, computer science, domain knowledge, and expertise. As I wrote in a previous blog post, human teams that successfully adapt to the evolving landscape will persevere. Those that don’t are likely to render themselves obsolete.
Exhibit 1 illustrates the overlapping functions. Looking at the Venn Diagram through the lens of job functions within an investment management firm: AI professionals cover math and statistics; technology professionals tackle computer science; and investment professionals bring a depth of knowledge, experience, and expertise to the team — with the help of data professionals.
Exhibit 1.
Table 1 deals solely with BD features. Clearly, professionals with skills in one area cannot be expected to deal with this level of complexity.
Table 1. BD and Five Vs
Volume, veracity, and value are challenging due to nagging uncertainty about completeness and accuracy of data, as well as the validity of garnered insights.
To unleash the potential of BD and AI, investment professionals must understand how these concepts operate together in practice. Only then can BD and AI drive efficiency, productivity, and competitive advantage.
Enter DG and DM. They are critical for managing data protection and secured data privacy, which are areas of significant regulatory focus. That includes post global financial crisis regulatory reform, such as the Basel Committee on Banking Supervision’s standard 239(BCBS239) and the European Union’s Solvency II Directive. More recent regulatory actions include the European Central Bank’s Data Quality Dashboard, the California Consumer Privacy Act, and the EU’s General Data Protection Regulation (GDPR), which compels the industry to better manage the privacy of individuals’ personal data.
Future regulations are likely to give individuals increased ownership of their data. Firms should be working to define digital data rights and standards, particularly in how they will protect individual privacy.
Data incorporates both the raw, unprocessed inputs as well as the resulting “content.” Content is the result of analysis — often on dashboards that enable story-telling. DG models can be built based on this foundation and DG practices will not necessarily be the same across every organization. Notably, DG frameworks have yet to address how to handle BD and AI models, which exist only ephemerally and change frequently.
What Are the Key Components of Data Governance?
Alignment and Commitment: Alignment on data strategy across the enterprise, and management commitment to it is critical. Guidance from a multi-stakeholder committee within an organization is desired.
From an internal control and governance perspective, a minimum level of transparency, explainability, interpretability, auditability, traceability, and repeatability need to be ensured for a committee to be able to analyze the data, as well as the models used, and approve deployment. This function should be separate from the well-documented data research and model development process.
Security: Data security is the practice of defining, labeling, and approving data by their levels of risk and reward, and then granting secure access rights to appropriate parties concerned. In other words, putting security measures in place and protecting data from unauthorized access and data corruption. Keeping a balance between user accessibility and security is key.
Transparency: Every policy and procedure a firm adopts must be transparent and auditable. Transparency means enabling data analysts, portfolio managers, and other stakeholders to understand the source of the data and how it is processed, stored, consumed, archived, and deleted.
Compliance: Ensuring that controls are in place to comply with corporate policies and procedures as well as regulatory and legislative requirements is not enough. Ongoing monitoring is necessary. Policies should include identifying attributes of sensitive information, protecting privacy via anonymization and tokenization of data where possible, and fulfilling requirements of information retention.
Stewardship: An assigned team of data stewards should be established to monitor and control how business users tap into data. Leading by example, these stewards will ensure data quality, security, transparency, and compliance.
What Are the Key Elements of Data Management?
Preparation: This is the process of cleaning and transforming raw data to allow for data completeness and accuracy. This critical first step sometimes gets missed in the rush for analysis and reporting, and organizations find themselves making garbage decisions with garbage data.
Creating a data model that is “built to evolve constantly” is far much better than creating a data model that is “built to last long as it is.” The data model should meet today’s needs and adapt to future change.
Databases collected under heterogeneous conditions (i.e., different populations, regimes, or sampling methods) provide new opportunities for analysis that cannot be achieved through individual data sources. At the same time, the combination of such underlying heterogeneous environments gives rise to potential analytical challenges and pitfalls, including sampling selection, confounding, and cross-population biases while standardization and data aggregation make data handling and analysis straightforward, but not necessarily insightful.
Catalogs, Warehouses, and Pipelines: Data catalogs house the metadata and provide a holistic view of the data, making it easier to find and track. Data warehouses consolidate all data across catalogs, and data pipelines automatically transfer data from one system to another.
Extract, Transform, Load (ETL): ETL means transforming data into a format to load into an organization’s data warehouse. ETLs often are automated processes that are preceded by data preparation and data pipelines.
Data Architecture: This is the formal structure for managing data flow and storage.
DM follows policies and procedures defined in DG. The DM framework manages the full data lifecycle that meets organizational needs for data utilization, decision-making, and concrete actions.
Having these DG and DM frameworks in place is critical to analyze complex BD. If data should be treated as an important company asset, an organization needs to be structured and managed as such.
What is more, it is key to understand that DG and DM should work in synchronization. DG without DM and its implementation ends up being a pie in the sky. DG puts all the policies and procedures in place, and DM and its implementation enable an organization to analyze data and make decisions.
To use an analogy, DG creates and designs a blueprint for construction of a new building, and DM is the act of constructing the building. Although you can construct a small building (DM in this analogy) without a blueprint (DG), it will be less efficient, less effective, not compliant with regulations, and with a greater likelihood of a building collapse when a powerful earthquake hits.
Understanding both DG and DM will help your organization make the most of the available data and make better business decisions.
References
Larry Cao, CFA, CFA Institute (2019), AI Pioneers in Investment Management, https://www.cfainstitute.org/en/research/industry-research/ai-pioneers-in-investment-management
Larry Cao, CFA, CFA Institute (2021), T-Shaped Teams: Organizing to Adopt AI and Big Data at Investment Firms, https://www.cfainstitute.org/en/research/industry-research/t-shaped-teams
Yoshimasa Satoh, CFA, (2022), Machine Learning Algorithms and Training Methods: A Decision-Making Flowchart, https://blogs.cfainstitute.org/investor/2022/08/18/machine-learning-algorithms-and-training-methods-a-decision-making-flowchart/
Yoshimasa Satoh, CFA and Michinori Kanokogi, CFA (2023), ChatGPT and Generative AI: What They Mean for Investment Professionals, https://blogs.cfainstitute.org/investor/2023/05/09/chatgpt-and-generative-ai-what-they-mean-for-investment-professionals/
Tableau, Data Management vs. Data Governance: The Difference Explained, https://www.tableau.com/learn/articles/data-management-vs-data-governance
KPMG (2021), What is data governance — and what role should finance play? https://advisory.kpmg.us/articles/2021/finance-data-analytics-common-questions/data-governance-finance-play-role.html
Deloitte (2021), Establishing a “built to evolve” finance data strategy: Robust enterprise information and data governance models, https://www2.deloitte.com/us/en/pages/operations/articles/data-governance-model-and-finance-data-strategy.html
Deloitte (2021), Defining the finance data strategy, enterprise information model, and governance model, https://www2.deloitte.com/content/dam/Deloitte/us/Documents/process-and-operations/us-defining-the-finance-data-strategy.pdf
Ernst & Young (2020), Three priorities for financial institutions to drive a next-generation data governance framework, https://assets.ey.com/content/dam/ey-sites/ey-com/en_gl/topics/banking-and-capital-markets/ey-three-priorities-for-fis-to-drive-a-next-generation-data-governance-framework.pdf
OECD (2021), Artificial Intelligence, Machine Learning and Big Data in Finance: Opportunities, Challenges, and Implications for Policy Makers, https://www.oecd.org/finance/artificial-intelligence-machine-learning-big-data-in-finance.htm.