Boosting data literacy: essential skills for early career researchers
Guidance for early career researchers on how to effectively manage and present data
You may also like
Popular resources
The significance of data literacy is hard to overstate nowadays. When it comes to research, it’s not just a case of crunching numbers or producing charts any more. As a researcher, you are tasked with navigating and filtering extensive datasets to identify valuable patterns and conclusions. Once insights are derived, it’s important to present them in a clear and impactful way, ensuring they are both understandable and actionable. If you’re just starting your research journey, mastering this art could mean the difference between sinking and swimming in academic waters.
Understanding data types and formats
For an early career researcher, mastering the nuances of data is essential. Different research scenarios call for different data formats, and choosing the right one can make your work more efficient and impactful. For basic datasets, such as survey results or experimental observations, the comma-separated value (CSV) format is often ideal. It’s straightforward and compatible with many analysis tools, making it a good starting point for most research projects.
However, if your research involves creating or working with web applications, you might encounter the JavaScript object notation (JSON) format. This is favoured for web-based projects because it allows data to be exchanged swiftly and seamlessly. For instance, if you’re developing an online tool to gather real-time feedback from study participants, JSON can help process and display that feedback instantly.
In more specialised research areas, especially those where intricate datasets are the norm, like healthcare or finance, extensible markup language (XML) is a valuable format. It’s designed to handle complex data structures, ensuring that every detail, no matter how minute, is accurately represented. For example, if you’re analysing a comprehensive medical database, XML can help to organise data, from patient demographics to detailed medical histories. Additionally, XML provides descriptive labels (metadata) that give an overview of the data’s content, helping users understand and locate specific information faster.
As you progress in your research journey, familiarising yourself with these formats will not only streamline your data management, but also enhance the clarity and credibility of your findings.
Effective data management and preparation
Both of these are crucial. Keeping naming conventions, folder structures and metadata consistent ensures that data is well organised and easily retrievable.
Before diving into any analysis, you must also ensure you have quality data. Data cleaning and preparation is done in three steps:
- Removing errors by using tools or software that can help to identify outliers or anomalies in your datasets. Popular applications include Microsoft Excel, OpenRefine (formerly Google Refine) or SPSS
- Ensuring that data entries follow a consistent format (for example, date formats should be uniform throughout)
- Converting it to the right format.
The analytical tool you choose often dictates the format your data should be in. R, for instance, is a powerful programming language used primarily for statistical computing and graphics. Researchers and statisticians use R to analyse data and produce visual representations of it. If your data is in a hierarchical format such as JSON or XML, and you intend to analyse it using R, you’ll need to transform it into a more tabular format, commonly referred to in R as a “data frame”. This transformation ensures R can process the data efficiently. You can achieve this by importing the data directly into R and making the necessary adjustments.
Furthermore, in a world where data breaches are all too common, regular backups, both on local storage such as external hard drives and on cloud storage solutions, can safeguard against data loss. Encrypting sensitive data adds a layer of security, ensuring that, even if it is accessed, it will remain unintelligible. Restricting data access through strong, unique passwords and considering multi-factor authentication (MFA) can further reduce breach risks.
Advanced data analysis
As you delve deeper, advanced data analysis becomes the core of your work. Using specialised software to identify patterns and trends is essential. Part of this analysis includes the use of methods such as regression analysis, which helps make sense of the relationship between variables, and t-tests, which compare the means of two groups. Gaining familiarity with statistical tools such as R, Python and SPSS can help significantly.
- Resource collection: ‘Power skills’ for 21st-century academia
- Data quality starts with strategy
- Creating poetry from data to aid analysis
Visualising and communicating data
Effective data visualisation is an art. By designing clear visuals – such as pie charts showing market shares, line graphs tracking changes over time, or maps highlighting regional variations – you can present complex data in an easily digestible way. Tools such as Tableau or Power BI elevate this further, allowing for the use of interactive dashboards where users can, for instance, zoom into specific data points or filter results.
However, beyond presenting data, true data literacy is about weaving a narrative. You should aim to tell a story with your data, perhaps by using relatable analogies, compelling infographics or even interactive presentations, ensuring your findings resonate with everyone, from fellow researchers to the general public.
Collaborating with academics
Communication in the academic realm extends beyond mere presentation; it’s deeply rooted in collaboration. Data literacy means not only understanding data but also leveraging collaborative platforms such as GitHub to pursue joint projects or tap into shared data repositories to benefit from colleagues’ datasets. It’s about working together on joint publications and combining diverse expertise to produce richer insights. This synergy, where researchers share, collaborate and build upon each other’s work, propels the academic community to new heights, breaking barriers and broadening perspectives.
Enhancing data literacy skills: a continuous journey
The journey towards boosting your data literacy skills is continuous but rewarding. It’s essential to start simple, focusing on foundational concepts before venturing into advanced areas.
There are several ways that you can boost your data literacy as an early career researcher.
Join online courses or workshops: take online courses and join workshops that can teach you the basics of data literacy.
Find a mentor: if you know someone who has strong data literacy skills, ask them to mentor you. They can help you to learn the basics of data analysis and visualisation, and they can also give you feedback on your work.
Join a data science club or community: data science clubs and online communities (such as Reddit, Open Data Science, Data Science Central and Data Science Society) offer opportunities to learn from other researchers, ask questions and get feedback on your work.
Read books and articles about data literacy: books and articles (such as Be Data Literate: The Data Literacy Skills Everyone Needs To Succeed by Jordan Morrow or Data Science for Dummies by Lillian Pierson and John Wiley & Sons, Inc) can teach you the basics of data literacy, as well as more advanced topics such as machine learning and artificial intelligence.
Navigating the data-driven landscape
As the academic landscape becomes increasingly data driven, the importance of data literacy becomes ever more pronounced. By embracing and enhancing these skills, early career researchers can set sail with confidence, ready to explore the vast oceans of data and the treasures they hold.
Derval Dally is a webmaster in the regional headquarters of the University of the West Indies.
If you would like advice and insight from academics and university staff delivered direct to your inbox each week, sign up for the Campus newsletter.