R

Statistical computing specialist - data analysis and visualization with ggplot2 and Tidyverse ecosystem

TL;DR

One-liner: R is the statistician’s language - built for data analysis and visualization.

Core Strengths:

  • Built-in statistical functions
  • Best-in-class visualization (ggplot2)
  • Tidyverse for modern data wrangling
  • 20,000+ packages on CRAN

Philosophy

R was built by statisticians for statisticians:

  • Vectors first - Everything is a vector. Operations are vectorized by default.
  • Interactive analysis - Designed for exploratory data analysis, not production systems
  • Functional style - Functions as first-class citizens, immutable data preferred
  • Domain specific - Not general purpose, but excellent at what it does

R prioritizes statistical correctness and ease of analysis over raw performance.

Quick Start

Install

# macOS
brew install r

# Linux (Ubuntu)
sudo apt install r-base

# Windows - download from r-project.org

Verify (latest: 4.5.2)

R --version  # R version 4.5.2

First Program

Create hello.R:

print("Hello, World!")
Rscript hello.R

Interactive R

R
> 2 + 2
[1] 4
> mean(c(1, 2, 3, 4, 5))
[1] 3

Download from posit.co

Language Essentials

Variables & Types

# Variables (use <- for assignment)
name <- "Alice"
age <- 25
price <- 19.99
active <- TRUE

# Vectors (fundamental data structure)
numbers <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Carol")

# Data frames (tables)
df <- data.frame(
  name = c("Alice", "Bob"),
  age = c(25, 30)
)

Control Flow

# if-else
if (age >= 18) {
  print("Adult")
} else if (age >= 13) {
  print("Teen")
} else {
  print("Child")
}

# ifelse (vectorized)
status <- ifelse(age >= 18, "Adult", "Minor")

# for loop
for (i in 1:5) {
  print(i)
}

# But prefer vectorized operations!

Functions

# Define function
greet <- function(name, greeting = "Hello") {
  paste(greeting, name, sep = ", ")
}

greet("Alice")           # "Hello, Alice"
greet("Bob", "Hi")       # "Hi, Bob"

# Anonymous function (R 4.1+)
add <- \(a, b) a + b
add(2, 3)  # 5

Vectorized Operations

x <- c(1, 2, 3, 4, 5)
y <- c(10, 20, 30, 40, 50)

# Operations apply element-wise
x + y      # c(11, 22, 33, 44, 55)
x * 2      # c(2, 4, 6, 8, 10)
x > 3      # c(FALSE, FALSE, FALSE, TRUE, TRUE)

# Functions work on vectors
mean(x)    # 3
sum(x)     # 15
sqrt(x)    # c(1, 1.41, 1.73, 2, 2.24)

Data Manipulation (Tidyverse)

library(dplyr)

# Pipe operator
df |>
  filter(age > 25) |>
  select(name, age) |>
  arrange(desc(age))

# Modern tidyverse style
df |>
  mutate(age_group = ifelse(age >= 30, "Senior", "Junior")) |>
  group_by(age_group) |>
  summarize(count = n(), avg_age = mean(age))

Gotchas

Vectors are 1-indexed

x <- c(10, 20, 30)
x[1]    # 10 (not 0!)
x[2:3]  # c(20, 30)
x[-1]   # c(20, 30) - negative means exclude

<- vs = for assignment

x <- 10  # Preferred, always works
x = 10   # Works, but <- is idiomatic

# Important difference in function calls
mean(x = c(1,2,3))  # x is an argument
mean(x <- c(1,2,3)) # x is assigned AND passed

NA handling

x <- c(1, 2, NA, 4)

mean(x)               # NA (NA propagates!)
mean(x, na.rm = TRUE) # 2.333...

is.na(x)              # c(FALSE, FALSE, TRUE, FALSE)
na.omit(x)            # c(1, 2, 4)

Factor surprises

f <- factor(c("a", "b", "a"))
levels(f)     # c("a", "b")

# Be careful with conversions
as.numeric(factor(c("3", "1", "2")))  # c(2, 1, 3) - NOT c(3, 1, 2)!

# Correct way
as.numeric(as.character(factor(c("3", "1", "2"))))  # c(3, 1, 2)

When to Choose

Best for:

  • Statistical analysis
  • Data visualization
  • Academic research
  • Exploratory data analysis

Not ideal for:

  • Web development
  • Production systems
  • General programming
  • Real-time applications

Comparison:

AspectRPythonJulia
StatsExcellentGoodGrowing
Vizggplot2matplotlibMakie
SpeedSlowMediumFast
CommunityStats/BioGeneralScientific

Next Steps

Ecosystem

Package Management

# Install from CRAN
install.packages("tidyverse")

# Load package
library(tidyverse)

# Check installed packages
installed.packages()

Essential Packages

  • Data: dplyr, tidyr, data.table
  • Visualization: ggplot2, plotly
  • Modeling: caret, tidymodels
  • Reporting: rmarkdown, Quarto