I decided to take a look at R this weekend between our family events. I had looked at R before when I ran across the R Tutorial. I bookmarked it and decided I would come back later. The other day at work a professor performing big data analysis started a conversation about R and offered to create some examples. He recommended working through the examples and tweaking those as a way to learn R.
Upon further consideration, I decided to go back to R Tutorial and work through the basics of input and data types. I felt like I needed a base before trying some of the examples. This approach seemed to work well-at least for me. I worked through the first 2-3 sections of the tutorial. When I got to the plotting section, things got interesting. Creating plots with the built-in functions seemed very straightforward and powerful. This led me on a quest to find more plotting libraries with subjectively more visually appealing graphs (the default plots aren’t too shabby).
The next graphing library I looked at was ggplot2. From the website examples, this library appeared to be exactly what I was looking for. The documentation in the reference manual for each of the functions is well documented. When I started trying to use the library, I had already pulled in my dataset from a CSV file. My initial problem was figuring out how the data was passed to the ggplot function and how this related to the qplot function. The “grammar of graphics” was getting the best of me. After searching the web, I was able to find the R Cookbook which had more examples of scatter plots-not to mention a lot of other good R information. These examples provided the missing link for me: how to supply the data to the graphing functions to get the plots to work. With the combination of ggplot2 and R Cookbook, I was able to create graphs that provided some additional insight into the data.
Some other things I wanted to note:
- Installing packages available on CRAN are extremely easy and they just worked.
- After starting with the binary for R, I then found RStudio. It looks like it is in early development, but it quickly became my default environment. With an editor, workspace and console, it was hard to find something better.
- R has a “batch” processing mode (plus Rscript) which looks interesting for processing data outside the environment.
These were just some of my early experiences with R. Overall, I really enjoyed using R and the ideas started flowing on how I might use it-everything from analyzing spending at home to analyzing data at work. Next, I hope to look at using R with ggplot2 to analyze the results from Apache Bench. If the results turn out interesting, I hope to find time to share them.