Introduction

To encourage new users, the purpose of these tutorials is to to teach you Data Science using the R-language in the simplest way . Learning Data Science goes hand in hand with gaining skills with a computer programming language. As it was covered in the first lecture, R is a free open source language

I have identified, through my experience of teaching R, six main building blocks of the language. We will cover these blocks in the following way:

  • Week 1: Learn R and its integration with R-studio, the operators, the atomic element and basic operations in the console.
  • Week 2: Learn basic object creation and manipulation: vectors, matrices, dataframes, arrays, lists.
  • Week 3: How to read/save data from R, and how to perform descriptive statistics and linear regression.
  • Week 4: Basic Data Visualization Using R.
  • Week 5: Basic Algorithms 1: Use of functions, loops, while, if, else statements.
  • Week 6 onward, we will deploy our skill running Machine Learning (ML) algorithms and String Analysis.

Instroctutions for the tutorials.

The exercises are simple, designed to develop intuition, confidence to step the learning curve!

Learning Curve

Source: Valamis, 2020

  1. Each weak, you will take look in this repository and read the corresponding tutorial exercise for each week.
  2. Here in this website, I am posting the instructions and the solutions to all the exercises.
  3. Your task is to perform all the excercises of each tutorial in an Rscrip, you can download the template.
  4. Here you can see an example of what to do in each tutorial:

How to solve the
excercises Tip: If the image is too small, download it, and view in your computer.


The R operators

The operators are special characters that perform an action with a specific result. The most common arithmetic operators to get some familiarity. These operators are not only common for R but you may know them from spreadsheets, calculators or other languages.

CRAN has an exhaustive list of operators that you can retrieve here

List of Operators
- Minus.
+ Plus.
: Sequence.
* Multiplication.
/ Division.
^ Exponentiation.
%% Modulus.

Atomic elements.

The atomic elements are the most fundamental building blocks of the R language. They are the most essential input of the language, here are the main types:

Atomic Element
1L, 2L, 3L, 10L Integer.
1.1, 4.5, 3.2
Numeric.
“A”, “B”,“3” Character.
TRUE, FALSE
Logical.
NA
Not Available / Missing Values.

Lines and chunks of code.

Now we can use operators + atomic elements to write lines of code. R runs these lines of code following these two rules:

  1. Similar to math, lines are read and evaluated from left to right.
  2. Similar to a book, lines are read and evaluated from top to button.

Lines of code are used to perform analysis, evaluate statements and any sort of operation that you can imagine. Different from independent lines of code, a group of two or more lines used for a particular operation is called a chunk of code. But, before we get into them, practice your knowledge of operators and atomic elements to perform basic arithmetic operations.

Other resources:

If you want to increase your knowledge of R and accelerate your learning check out:

Exercise 1: Use R as a calculator.

Instruction: Solve the following exercises by writing the correct atomic elements and operators to perform each arithmetic operation. Each exercise can solve in different ways, but the solution should be the same.

Writing tips

  1. Use the most simple combination of atomic elements and operators. For instance, if asked to perform 2 plus 3, avoid using more atomic elements and operators typing 2 + 1 + 1 + 1. The best answer is not only correct but also the simplest. Example: 2 + 3.
  2. Leave a space or white space in between each operator and atomic element. For instance, write 2 + 3 but not 2+3. An exception to this rule is to write powers, which 3^3 is preferred but not


Solve:

  • 1.1 Addition: 2 plus 3.
[1] 5
  • 1.2 Subtraction: 5 minus 4. 5 - 4
[1] 1
  • 1.3 Multiplication: 3 times 5.
[1] 15
  • 1.4 Division: 10 divided by 2.
[1] 5
  • 1.5 Exponentiation: 4 to the power of 3.
[1] 64
  • 1.6 Module: 5 module 2
[1] 1

Exercise 2: Use of parentheses.

Use parentheses () to group expressions only if needed. For instance, (3 + 2) is equivalent to 2 + 3, but the latter answer uses fewer operators, and it is better. Parentheses are used correctly to specify the order in which R performs operations. For instance, 9 + 1/2 is not the same as (9 + 1)/2. In the former expression, 9 is added to 1/2, but in the latter (9 + 1) is solved first and then divided (9 + 1)/2. Remember that lines are evaluated from left to right.


  • 2.1 Solve 10 plus 5 and divide by 3. Hint solution is 5.
(10 + 5)/3
[1] 5

Solve, the next exercises:

  • 2.2 Solve 6 plus 3 to the power of 2 and divide by 3. Hint, the solution is 27.
[1] 27
  • 2.3 Solve 16 to the power of 1/2 multiplied by 3. Hint, the solution is 24.
[1] 24
  • 2.4 Multiply 3 times 12 plus 4, all divided by 2. Hint, the solution is 24.
[1] 24
  • 2.5 Divide 4 by 2 times 3, all that times 18 minus 6. Hint, the solution is 8
[1] 8

Exercise 3: Use of logical operators

The logical operators is used to assess relationships between atomic elements. Here you have some examples:

  • The snipped 4==4, will return a TRUE in the console.
  • The snipped 4==3, will return a FALSE in the console.
  • The snipped 7!=7, will return a FALSE in the console because 7 is not different from 7.
  • The snipped 5>3, will return a FALSE.
  • Similarly, 5<7, will return a FALSE.
  • But not, 2<=4, that will return TRUE, because 2 is smaller or equal to 4


Solve:

  • 3.1 Assess if TRUE is equal to TRUE, and then to FALSE. Tip, you have to group using parenthesis.
(T == T) == F
[1] FALSE
  • 3.2 Verify if 4 is greater than 6.
[1] FALSE
  • 3.3 Verify that 7 is less than 2.
[1] FALSE
  • 3.4 Verify that 6 times 7 is less than 8 times 9.
[1] FALSE
  • 3.5 Vecorized: Evaluates element by element. DO NOT CHANGE THE CODE!
c(2, 3, 4) | c(2, 3, 4) == c(2, 3, 4)
[1] TRUE TRUE TRUE
  • 3.6 Not vectorlized: outputs a single statement. DO NOT CHANGE THE CODE!
c(2, 3, 4) || c(2, 3, 4) == c(2, 3, 4)
[1] TRUE

Excersise 4: Objects

R is an object-oriented programming language, which means for simplicity that everything aside the operators and syntax is defined an object.

This objects have specific attributes which can be retrieved by functions, such as: class class(x), structure str(x); type of typeof(x). Unidimensional objects such as vectors and lists are compatible with the length(x) which returns their number of elements for instance. However, more complex objects (multidimensional), such as matrices and data frames have dimensions, returned by dim(x), also nrow(x) and ncol(x) to estimate the number of rows and columns respectively.

Pay attention to the use of the left assignment <- for storing objects; the parenthesis () for declaring arguments of functions; and squared brakets [] or the dollar sign $ for subsetting objects.

Vectors (Also called atomic vectors).


Sole:

  • 4.1 Declare a NULL vector called z, then assess the class, length and typeof object.
[1] "NULL"

[1] 0

[1] "NULL"
  • 4.2 Declare a numeric vector with two elements c(1,2) called a. Then, add names c('one', 'two') to the elements of a with the names function. Print the names of a.
[1] "one" "two"
  • 4.3 Extract the element of a called two and print the output in the console.
two 
  2 
  • 4.4 Create a vectorwith two integers, remember to use the L operator after the number. For instance, 1L, but not 1. Call this vector `b and print the class in the console.
[1] "integer"
  • 4.5 Create a logical vector with two elements c(T, F), call this vector c. Then use this vector to print the vector b in the console.
[1] 2
  • 4.6 Use the function LETTERS to create a character vector with the first 10 letters of the alphabet. Call this vector d and print it in the console.
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
  • 4.7 Create a factor called e, which is a categorical variable, with two elements c('male', 'female') with the function factor, print the class of vector in the console.
[1] "factor"
  • 4.8 Create an ordinal variable with the function factor called f, with the labels c('poor', 'good', 'excellent'); print the vector and check that is.ordered returns a TRUE value.
[1] poor      good      excellent
Levels: poor < good < excellent

[1] TRUE

Matrices, Data Frames and Lists.

  • 5.1 Create a matrix of 5x5 with the numbers from 1L to 25L. Put the element in ascendant order in every row and call it this matrix A, print it in the console.
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20
[5,]   21   22   23   24   25
  • 5.2 Use the function rbind to arrange three character vectors with two elements each by row. Call this matrix B and print it in the console.
  [,1] [,2]
a "A"  "B" 
b "C"  "D" 
c "E"  "F" 
  • 5.3 Assess the class of matrices A and B, print the output in the console.
[1] "matrix" "array" 

[1] "matrix" "array" 
  • 5.4 Extract the 25th element of the matrix A, print the output in the console.
[1] 25
  • 5.5 Extract the second column of the matrix B and print it in the console.
  a   b   c 
"B" "D" "F" 
  • 5.6 Transform the matrix A into a data.frame and call it df1, print the dim, dimmension, of this object.
[1] 5 5
  • 5.7 Create 5 vectors, integer, numeric, logical, character and factor, call this vectors a, b, …, e, with three elements respectively. Then use this vectors to create a data.frame with five columns, one for each vector. Call this last object df2 and print the output in the console.
  a  b     c d e
1 1  6  TRUE k P
2 2  7 FALSE l Q
3 3  8  TRUE m R
4 4  9 FALSE n S
5 5 10 FALSE o T
  • 5.8 Assess the class of each column of the df2 and print the output in the console.
$a
[1] "integer"

$b
[1] "integer"

$c
[1] "logical"

$d
[1] "character"

$e
[1] "factor"
  • 5.9 Create a list, called L1, that contains matrices A, B, and the data frames df1 and df2, and the vectors b, c and e. Print the output in the console.
[[1]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20
[5,]   21   22   23   24   25

[[2]]
  [,1] [,2]
a "A"  "B" 
b "C"  "D" 
c "E"  "F" 

[[3]]
  V1 V2 V3 V4 V5
1  1  2  3  4  5
2  6  7  8  9 10
3 11 12 13 14 15
4 16 17 18 19 20
5 21 22 23 24 25

[[4]]
  a  b     c d e
1 1  6  TRUE k P
2 2  7 FALSE l Q
3 3  8  TRUE m R
4 4  9 FALSE n S
5 5 10 FALSE o T

[[5]]
[1]  6  7  8  9 10

[[6]]
[1]  TRUE FALSE  TRUE FALSE FALSE

[[7]]
[1] P Q R S T
Levels: P Q R S T

Excersise 6: Theoretical Questions.

  • 6.1 How does Data Science is different from statistics and mathematics?

  • 6.2 Explain one advantage of Big data in comparison to survey data or experimental data.

  • 6.3 Explain the main activities in Data Science that take place before the analysis.