‘Data entry’ as an academic process: basics of data management and organization
Presented by In the strongly numerical turn of ecology and conservation, there is a popular drive towards learning and using statistics to interpret environmental datasets. Researchers often feel that knowing their statistics can offer their claims a ‘voice’, through publications, quantitative evidence and what not. However, we have frequently experienced, in numerous statistical workshops and classroom sessions that the first step towards statistical analysis is often a slippery one. This is the stage of data entry and handling, where students or researchers are often stuck at how to enter their voluminous data in a clear, unambiguous, statistically tenable and cross-software compatible manner. We argue that overcoming this initial hurdle is an important part of the ‘academic process’, as much as is the thirst for number crunching later. Moreover, good practices in data entry must be imbibed as an effective means of communicating research in the ‘open data’ age.
This proposed short workshop aims to provide some basic guidelines on good practices of data entry, management and organization. We plan to make this a hands-on workshop that will focus on getting the basic tenets of data entry into spreadsheet packages, and their interface with the statistical software R, right. The workshop will cover the following broad topics:0. Data entry as an academic process: an introduction to concepts1. Handling data: basics of data entry in spread-sheet format.2. Data formats: matrices, data frames, arrays, lists, vectors, etc., dimensions and structure of data3. Dealing with missing values, blanks and zero values, data cleaning and preparation4. Metadata maintenance and upkeep5. Handling geospatial coordinates and time/date-stamps6. Data arrangements: long, wide formats etc., + alphanumeric data, factors, characters, numbers, text etc.7. Data entry and management for specific types of research: population ecology, community ecology, wildlife surveys, social surveys8. Steps towards exploratory data analysis: pivot tables, queries, outlier and error detection, basic comparisons9. Data entry with specific statistical analysis in mind: correlation, regression, statistical hypothesis testing 10. Interface with R: data entry and handling in R.
Main aims and goal:
To outline basic good practices of data entry and management for students and early- career researchers as an important part of the academic process and effective communication.
To help students organize their data well for future statistical analysis (without tears) as well as archiving purposes.
Target audience:Students (preferably undergrad-level) and early-career researchers involved in environmental research and conservation.
Workshop organisers:Aniruddha Marathe, Nachiket Kelkar