Tutorial 1: R, syntax, atomic elements and console.
Introduction
To encourage new users, the purpose of these tutorials is to to teach you Data Science using the R-language in the simplest way . Learning Data Science goes hand in hand with gaining skills with a computer programming language. As it was covered in the first lecture, R is a free open source language
I have identified, through my experience of teaching R, six main building blocks of the language. We will cover these blocks in the following way:
- Week 1: Learn R and its integration with R-studio, the operators, the atomic element and basic operations in the console.
- Week 2: Learn basic
object
creation and manipulation:vectors
,matrices
,dataframes
,arrays
,lists
. - Week 3: How to read/save data from R, and how to perform descriptive statistics and linear regression.
- Week 4: Basic Data Visualization Using R.
- Week 5: Basic Algorithms 1: Use of
functions
,loops
,while
,if
,else
statements. - Week 6 onward, we will deploy our skill running Machine Learning (ML) algorithms and String Analysis.
Instroctutions for the tutorials.
The exercises are simple, designed to develop intuition, confidence to step the learning curve!
Source: Valamis, 2020
- Each weak, you will take look in this repository and read the corresponding tutorial exercise for each week.
- Here in this website, I am posting the instructions and the solutions to all the exercises.
- Your task is to perform all the excercises of each tutorial in an
Rscrip
, you can download the template. - Here you can see an example of what to do in each tutorial:
Tip: If the image is too small, download it, and view in your computer.
The R operators
The operators are special characters that perform an action with
a specific result. The most common arithmetic operators to get some
familiarity. These operators are not only common for R
but you may
know them from spreadsheets, calculators or other languages.
CRAN has an exhaustive list of operators that you can retrieve here
List of Operators | |
---|---|
- | Minus. |
+ | Plus. |
: | Sequence. |
* | Multiplication. |
/ | Division. |
^ | Exponentiation. |
%% | Modulus. |
Atomic elements.
The atomic elements are the most fundamental building blocks of the R language. They are the most essential input of the language, here are the main types:
Atomic Element | |
---|---|
1L, 2L, 3L, 10L | Integer. |
1.1, 4.5, 3.2 |
Numeric. |
“A”, “B”,“3” | Character. |
TRUE, FALSE |
Logical. |
NA |
Not Available / Missing Values. |
Lines and chunks of code.
Now we can use operators + atomic elements to write lines of code. R runs these lines of code following these two rules:
- Similar to math, lines are read and evaluated from left to right.
- Similar to a book, lines are read and evaluated from top to button.
Lines of code are used to perform analysis, evaluate statements and any
sort of operation that you can imagine. Different from independent lines
of code, a group of two or more lines used for a particular operation is
called a chunk
of code. But, before we get into them, practice your
knowledge of operators and atomic elements to perform basic arithmetic
operations.
Other resources:
If you want to increase your knowledge of R and accelerate your learning check out:
- The excellent Introduction to R from Datacamp.
- The Comprehensive R Archive Network (CRAN) also has a more comprehensive manual.
- Watch for free the videos of the Advance Your Skills as an R Expert, from Linkedin
Exercise 1: Use R as a calculator.
Instruction: Solve the following exercises by writing the correct atomic elements and operators to perform each arithmetic operation. Each exercise can solve in different ways, but the solution should be the same.
Writing tips
- Use the most simple combination of atomic elements and
operators. For instance, if asked to perform
2 plus 3
, avoid using more atomic elements and operators typing2 + 1 + 1 + 1
. The best answer is not only correct but also the simplest. Example:2 + 3
. - Leave a space or white space in between each operator and atomic
element. For instance, write
2 + 3
but not2+3
. An exception to this rule is to write powers, which3^3
is preferred but not
Solve:
- 1.1 Addition: 2 plus 3.
[1] 5
- 1.2 Subtraction: 5 minus 4. 5 - 4
[1] 1
- 1.3 Multiplication: 3 times 5.
[1] 15
- 1.4 Division: 10 divided by 2.
[1] 5
- 1.5 Exponentiation: 4 to the power of 3.
[1] 64
- 1.6 Module: 5 module 2
[1] 1
Exercise 2: Use of parentheses.
Use parentheses ()
to group expressions only if needed. For instance,
(3 + 2)
is equivalent to 2 + 3
, but the latter answer uses fewer
operators, and it is better. Parentheses are used correctly to specify
the order in which R performs operations. For instance, 9 + 1/2
is
not the same as (9 + 1)/2
. In the former expression, 9
is added to
1/2
, but in the latter (9 + 1)
is solved first and then divided
(9 + 1)/2
. Remember that lines are evaluated from left to right.
- 2.1 Solve 10 plus 5 and divide by 3. Hint solution is 5.
(10 + 5)/3
[1] 5
Solve, the next exercises:
- 2.2 Solve 6 plus 3 to the power of 2 and divide by 3. Hint, the solution is 27.
[1] 27
- 2.3 Solve 16 to the power of 1/2 multiplied by 3. Hint, the solution is 24.
[1] 24
- 2.4 Multiply 3 times 12 plus 4, all divided by 2. Hint, the solution is 24.
[1] 24
- 2.5 Divide 4 by 2 times 3, all that times 18 minus 6. Hint, the solution is 8
[1] 8
Exercise 3: Use of logical operators
The logical operators is used to assess relationships between atomic elements. Here you have some examples:
- The snipped
4==4
, will return aTRUE
in the console. - The snipped
4==3
, will return aFALSE
in the console. - The snipped
7!=7
, will return aFALSE
in the console because7
is not different from7
. - The snipped
5>3
, will return aFALSE
. - Similarly,
5<7
, will return aFALSE
. - But not,
2<=4
, that will returnTRUE
, because2
is smaller or equal to4
Solve:
- 3.1 Assess if TRUE is equal to TRUE, and then to FALSE. Tip, you have to group using parenthesis.
(T == T) == F
[1] FALSE
- 3.2 Verify if 4 is greater than 6.
[1] FALSE
- 3.3 Verify that 7 is less than 2.
[1] FALSE
- 3.4 Verify that 6 times 7 is less than 8 times 9.
[1] FALSE
- 3.5 Vecorized: Evaluates element by element. DO NOT CHANGE THE CODE!
c(2, 3, 4) | c(2, 3, 4) == c(2, 3, 4)
[1] TRUE TRUE TRUE
- 3.6 Not vectorlized: outputs a single statement. DO NOT CHANGE THE CODE!
c(2, 3, 4) || c(2, 3, 4) == c(2, 3, 4)
[1] TRUE
Excersise 4: Objects
R is an object-oriented programming language, which means for simplicity that everything aside the operators and syntax is defined an object.
This objects have specific attributes which can be retrieved by
functions, such as: class class(x)
, structure str(x)
; type of
typeof(x)
. Unidimensional objects such as vectors and lists are
compatible with the length(x)
which returns their number of elements
for instance. However, more complex objects (multidimensional), such as
matrices and data frames have dimensions, returned by dim(x)
, also
nrow(x)
and ncol(x)
to estimate the number of rows and columns
respectively.
Pay attention to the use of the left assignment <-
for storing
objects; the parenthesis ()
for declaring arguments of functions;
and squared brakets []
or the dollar sign $
for subsetting
objects.
Vectors (Also called atomic vectors).
Sole:
- 4.1 Declare a
NULL
vector calledz
, then assess theclass
,length
andtypeof
object.
[1] "NULL"
[1] 0
[1] "NULL"
- 4.2 Declare a
numeric
vector
with two elementsc(1,2)
calleda
. Then, add namesc('one', 'two')
to the elements ofa
with thenames
function. Print thenames
ofa
.
[1] "one" "two"
- 4.3 Extract the element of
a
calledtwo
and print the output in the console.
two
2
- 4.4 Create a
vector
with two integers, remember to use theL
operator after the number. For instance,1L
, but not1
. Call this vector `b
and print theclass
in the console.
[1] "integer"
- 4.5 Create a logical vector with two elements
c(T, F)
, call this vectorc
. Then use this vector to print the vectorb
in the console.
[1] 2
- 4.6 Use the function
LETTERS
to create acharacter
vector
with the first 10 letters of the alphabet. Call this vectord
and print it in the console.
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
- 4.7 Create a
factor
callede
, which is a categorical variable, with two elementsc('male', 'female')
with the functionfactor
, print theclass
of vector in the console.
[1] "factor"
- 4.8 Create an ordinal variable with the function
factor
calledf
, with thelabels
c('poor', 'good', 'excellent')
; print thevector
and check thatis.ordered
returns aTRUE
value.
[1] poor good excellent
Levels: poor < good < excellent
[1] TRUE
Matrices, Data Frames and Lists.
- 5.1 Create a
matrix
of5x5
with the numbers from1L
to25L
. Put the element in ascendant order in everyrow
and call it this matrixA
, print it in the console.
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25
- 5.2 Use the function
rbind
to arrange threecharacter
vectors
with two elements each by row. Call this matrixB
and print it in the console.
[,1] [,2]
a "A" "B"
b "C" "D"
c "E" "F"
- 5.3 Assess the class of matrices
A
andB
, print the output in the console.
[1] "matrix" "array"
[1] "matrix" "array"
- 5.4 Extract the 25th element of the matrix
A
, print the output in the console.
[1] 25
- 5.5 Extract the second column of the matrix
B
and print it in the console.
a b c
"B" "D" "F"
- 5.6 Transform the matrix
A
into adata.frame
and call itdf1
, print thedim
, dimmension, of this object.
[1] 5 5
- 5.7 Create 5 vectors,
integer
,numeric
,logical
,character
andfactor
, call this vectorsa
,b
, …,e
, with three elements respectively. Then use this vectors to create adata.frame
with five columns, one for each vector. Call this last objectdf2
and print the output in the console.
a b c d e
1 1 6 TRUE k P
2 2 7 FALSE l Q
3 3 8 TRUE m R
4 4 9 FALSE n S
5 5 10 FALSE o T
- 5.8 Assess the
class
of each column of thedf2
and print the output in the console.
$a
[1] "integer"
$b
[1] "integer"
$c
[1] "logical"
$d
[1] "character"
$e
[1] "factor"
- 5.9 Create a list, called
L1
, that contains matricesA
,B
, and the data framesdf1
anddf2
, and the vectorsb
,c
ande
. Print the output in the console.
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25
[[2]]
[,1] [,2]
a "A" "B"
b "C" "D"
c "E" "F"
[[3]]
V1 V2 V3 V4 V5
1 1 2 3 4 5
2 6 7 8 9 10
3 11 12 13 14 15
4 16 17 18 19 20
5 21 22 23 24 25
[[4]]
a b c d e
1 1 6 TRUE k P
2 2 7 FALSE l Q
3 3 8 TRUE m R
4 4 9 FALSE n S
5 5 10 FALSE o T
[[5]]
[1] 6 7 8 9 10
[[6]]
[1] TRUE FALSE TRUE FALSE FALSE
[[7]]
[1] P Q R S T
Levels: P Q R S T
Excersise 6: Theoretical Questions.
-
6.1 How does Data Science is different from statistics and mathematics?
-
6.2 Explain one advantage of Big data in comparison to survey data or experimental data.
-
6.3 Explain the main activities in Data Science that take place before the analysis.