It is important to know your data with all its errors, quirks, and trends, before starting formal statistical analysis. Exploring the data prior to using statistics will help confirm assumptions, detect erroneous values, and provide useful insights for formal analysis. However, nobody can directly comprehend voluminous spread of rows and columns filled with numbers and categories. Instead, we can pull out parts of data to construct summaries in the form of tables and graphs. Such representations of the data provide clear and intuitive ways to gain personal clarity and facilitate communication with peers.
This workshop will focus on tools for data handling and visualization. Participants will explore data using different types of graphs, such as frequency histogram, box plots, and scatter plots. The hands on session will rely strongly on the R statistical environment and make full use of its command line interface. Participants who are not acquainted with R will receive additional material before the workshop to familiarize with its interface and basic workings.
The workshop will cover following broad topics:
1. Using summaries of data for sanity check
2. Plots with single variable
3. Plots with two and more variables
4. Customizing symbols, legends, axes, and other elements
5. Constructing plots like a sentence or argument (using grammar of graphics through the 'ggplot2' package)
6. Exporting publication quality graphs