The Best Data Science Certification You’ve Never Heard Of
Eight years ago, data science was proclaimed “the sexiest job of the 21st century.” Yet plodding through hours of data munging still feels decidedly unsexy. If anything, the storied rise of the data science career has illustrated just how poorly most organizations are doing when it comes to managing their data.
Enter the Certified Data Management Professional (CDMP) from Data Management Association International (DAMA). The CDMP is the best data strategy certification you’ve never heard of. (And honestly, when you consider the fact that you’re probably working a job that didn’t exist ten years ago, it’s not surprising that this certification isn’t widespread just yet.)
The Best Data Science Certification You’ve Never Heard Of
Data strategy is a crucial discipline that spans end-to-end management of the data lifecycle as well as associated aspects of data governance and key considerations of data ethics.
This article outlines the hows and whys of getting the CDMP, which lays the groundwork for effective thought leadership on data strategy. It also includes a survey — you can offer your thoughts on the most important aspects of data management for data science and check out the consensus of the community.
Disclaimer: this post is not in any way sponsored. Views reflected are mine alone.
About the CDMP Exam
Training for the CDMP confers expertise across 14 areas related to data strategy (which I’ll cover in more detail in a later section). Studying takes about 50–60 hours — so roughly comparable to the time commitment required for something like the AWS Cloud Practitioner Certification.
When you schedule the exam ($300), DAMA provides 40 practice questions that are pretty reflective of the difficulty of the actual exam. As a further resource, check out this article about the process of studying for a certification.
The test is 100 questions limited to 90 minutes — generally the full time is used. It’s possible to sit for the exam online while monitored via webcam ($11 proctoring fee). The format of the exam is multiple choice — either 5 options or T/F. You can mark questions and come back to them. At the conclusion of test taking, you get immediate feedback on your score.
Anything over 60% is considered passing. This is just fine if you’re interested in getting your CDMP Associate certification and moving along. If you’re interested in the advanced tiers of CDMP certification, you’ll have to pass with a 70% (CDMP Practitioner) or 80% (CDMP Master). To get certified at the highest level, CDMP Fellow, you’ll need to attain the Master Certification and also demonstrate industry experience and contribution to the field. Each of these advanced certifications also require passing two Specialist exams.
This brings me to my final point, which is about why — purely from a career advancement standpoint — you should chose to put yourself through the studying and exam taking process for CDMP: certification from DAMA is associated with high-end positions in leadership, management, and data architecture. (Think of CDMP as getting credentialed into a semi-secret society of data ninjas.) Increasingly, enterprise roles and federal contracts related to data management are requesting CDMP certification.
- Provides well-rounded knowledge base on topics related to data strategy
- Four tiers for different levels of data management professionals
- 60% score requirement to pass lowest level of certification
- Associated with elite roles
- Provides 3 year membership to DAMA International
- $311 exam fee is cheaper than other data-related certifications from Microsoft and The Open Group
- DAMA is not backed by a major tech company (e.g. Amazon, Google, Microsoft) that is actively pushing marketing efforts and driving brand recognition for CDMP certification — this means that CDMP is likely to be recognized as valuable mainly among individuals who are already familiar with data management
- $311 exam fee is relatively expensive compared to AWS Cloud Practitioner cert ($100) or GPC certs ($200)
- Microsoft Certified Solutions Associate (MCSA) — modularized certifications focusing on various Microsoft products ($330+)
- Microsoft Certified Solutions Expert (MCSE) — builds on the MCSA with integrated certifications on topics such as Core Infrastructure, Data Management & Analytics, and Productivity ($495+)
- The Open Group Architecture Framework (TOGAF) —various levels of certification on high-level framework for software development and enterprise architecture methodology ($550+)
- Scaled Agile Framework (SAFe) — role-based certifications for software engineering teams ($995)
How to prepare for CDMP
To study for the exam, all that’s needed is the DAMA Body of Knowledge book (DMBOK $55). It’s around 600 pages, but if you mainly focus your study time on Chapter 1 (Data Management), diagrams & schemas, roles & responsibilities, and definitions, then this should get you 80% of the way toward a passing score.
In terms of how to use DMBOK, one test taker recommended 4–6 hours per weekend for 8–10 weeks. Another approach could be reading a couple pages each morning and evening. However you tackle it, make sure you’re incorporating spaced repetition into your studying methodology.
In addition to being your study guide for the exam, the DMBOK is of course useful as reference book, and you can drop it on your colleague’s desk if they need to learn data strategy or if they’ve nodded off during a webinar.
What’s tested on the CDMP
The CDMP covers 14 topics —I’ve listed them in order of the prevalence with which they occur on the exam and provided a brief definition for each.
Data Governance ( 11%) — practices and processes to ensure formal management of data assets.
Data Quality ( 11%) — assuring data is fit for consumption based on its accuracy, completeness, consistency, integrity, reasonability, timeliness, uniqueness/deduplication, validity, and accessibility.
Data Modelling and Design ( 11%) — translation of business needs into technical specifications.
Metadata Management (11%) — information about data collected.
Master and Reference Data Management (10%) — reference data is information used to categorize other data found in a database, or information that is solely for relating data in a database to information beyond the boundaries of the organization. Master reference data refers to information that is shared across a number of systems within the organization.
Data Warehousing and Business Intelligence (10%) — a data warehouse stores information from operational systems (as well as other data resources, potentially) in a way that is optimized to support decision-making processes. Business intelligence refers to the use of technology to gather and analyze data, then translate it into useful information.
Document and Content Management (6%) — technologies, methods, and tools used to organize and store an organization’s documents.
Data Integration and Interoperability ( 6%) — use of technical and business processes to merge data from different sources, with the goal of readily and efficiently providing access to valuable information.
Data Architecture (6%) — specifications to describe existing state, define data requirements, guide data integration, and control data assets, according to the organization’s data strategy.
Data Security ( 6%) — implementation of policies and procedures to ensure people and things take the right actions with data and information assets, even in the presence of malicious inputs.
Data Storage and Operations ( 6%) — characterization of hardware or software that holds, deletes, backs up, organizes, and secures an organization’s information.
Data Management Process ( 2%) — end-to-end management of data, including collection, control, protection, delivery, and enhancement.
Big Data ( 2%) — extremely large datasets, often composed of various structured, unstructured, and semi-structured data types.
Data Ethics ( 2%) — code of conduct encompassing data handling, algorithms, and other practices to ensure that data is used appropriately in a moral context.
Why data scientists should get CDMP certified
Still not convinced why data strategy is important? Let’s take a look from the perspective of a data scientist aiming to increase their knowledge and earning potential.
It’s been said that a data scientist sits at the nexus of statistics, computer science, and domain knowledge. Why would you want to add one more thing to your plate?
Successwise, you’re better off being good at two complementary skills than being excellent at one
Scott Adams, author and creator of the Dilbert comics, offers the idea that “every skill you acquire doubles your odds of success.” He acknowledges this may be somewhat of an oversimplification — “obviously some skills are more valuable than others, and the twelfth skill you acquire might have less value than each of the first eleven” — but the point is that sometimes it’s better to go wide than to go deep.
Setting aside the relative magnitude of the benefit (because I seriously doubt it’s 2x per skill… thank you, law of diminishing marginal returns), it seems unquestionable that broadening your skillset can lead to more significant gains relative to toiling away at learning one specific skills. In a nutshell, this is why I think it’s important for a data scientist to learn data strategy.
Generally speaking, having diversity in your skillset allows you to:
- Problem solve more effectively by drawing on cross-disciplinary learnings
- Communicate better with your teammates from other specialties
- Get your foot in the door in terms of gaining access to new projects
Understanding data strategy transforms you from being a data consumer into an empowered data advocate at your organization. It’s worth putting up with all the tongue twister acronyms (DMBOK — really? Couldn’t they have just called it The Data Management Book?) in order to deepen your appreciation for the end-to-end knowledge generating process.