This lab will introduce functions related to combinatorics, along with functions for calculating probabilities for some of the special discrete distributions discussed in lecture. Exercise are created for practice purposes only and will not be graded. These questions will require you to know the difference between a permutation and a combination. Recall that we care about the order of the elements with permutations; with combinations we don’t.
While R serves as a very powerful program in its base form, an ever-growing collection of packages are available to increase its capabilities. R package’s are delevoped and shared by the R community and can include a collection of functions, code, data, documentation, and tests. For instance, the package called gtools includes functions to enumerate permutations and combinations. To make use of it, we need to first need to download and install it into R.
The Comprehensive R Archive Network (CRAN) is the official and most popular repository for R packages. Packages published here need to pass a series of tests before they are accepted which makes it more reliable than random peices of code you find on the interwebs. If the package you are looking for is stored on CRAN (and chances are it is), then you can use the following code to download it into your R library.
install.packages("gtools") # only needs to be run once in your lifetime
library("gtools") # needs to be run every time you reopen R
The Rf{install.packages} function downloads the pacakge {} to your R library. Think about this as buying a book; once you buy it once, you don’t need to buy it again. The Rf{library}Rg{()} function loads this pacakge into your R session so you can use all the functions it contains. You can think about this step as taking the book down from the shelf and viewing its contents. Assuming you put the book back on the shelf every time you finished reading, then each time you want to reference it, you’ll again have to retreive it from the bookshelf. To view the documention of a package, you can type:
help(package="< packagename >")
In the next section we’ll make use of some of the functions in {}.
While R has a huge selection of functions to choose from, inevitablly we may need to perform a specific task that is availabe in function-from in R. Hence it will be useful to learn how to create a user defined function.
In R, the user can create functions using the general form:
functionName <- function(argument1, argument2=default_value, ...) {
# comments
[Statements]
...
return(<something we calculated above>)
}
Here functionName
is the name of the function (we need to follow the same rules for naming functions as we did for naming variables; see lab 1) with any arguments (i.e. inputs) the function takes specified in parenthesis. For example, we might define a function called add
that takes an input argument x
and adds it to the second argument y
as follows:
add <- function(x,y){
x + y
}
add(2,4)
add(2) # y is missing and will produce and error
add(2,4,3) # too many arguments, will produce and error
If we wanted to, we could give y
, for example, a default value. That is, if the user didn’t specify y
we would take it to be its default value of 100, say.
add <- function(x,y=100){
x + y
}
add(2)
## [1] 102
As a default, R will just print out the last line of the function but its always best to explicity state what you want the funtion to output using the Rf{return} function.
add <- function(x,y){
out <- x + y
return(out)
}
add(2,4) # same as before
## [1] 6
If for some reason I wanted to output x
and y
as well, I could do so doing:
add <- function(x,y){
out <- x + y
return(list(out,x,y))
}
add(2.5, -8)
## [[1]]
## [1] -5.5
##
## [[2]]
## [1] 2.5
##
## [[3]]
## [1] -8
We have learned some helpful combinatorics rules that allow us to count things more efficiently. Let’s have a look at how we can count these in R. The function permutations
and combinations
are available in the gtool package in R. Once you have loaded this into your R session you can obtain their help files as you would any other function in R (?permutations
or ?combinations
).
Let’s look at a simple example of rearrange the letters in PAUL
. First lets store the individual letters in a vector object called word
.
word <- c("P","A","U","L")
How many “words” can we create from the letters in PAUL
using each letter exactly once?
In class we got the solution (\(=4!\)). Note that factorials are calculated in R using the factorial
function:
factorial(4)
## [1] 24
Try to get at this answer using the permutation
function.
# collection of four letter words (no repeats)
(paul.no.rep <- permutations(n=4,r=4,v=word))
## [,1] [,2] [,3] [,4]
## [1,] "A" "L" "P" "U"
## [2,] "A" "L" "U" "P"
## [3,] "A" "P" "L" "U"
## [4,] "A" "P" "U" "L"
## [5,] "A" "U" "L" "P"
## [6,] "A" "U" "P" "L"
## [7,] "L" "A" "P" "U"
## [8,] "L" "A" "U" "P"
## [9,] "L" "P" "A" "U"
## [10,] "L" "P" "U" "A"
## [11,] "L" "U" "A" "P"
## [12,] "L" "U" "P" "A"
## [13,] "P" "A" "L" "U"
## [14,] "P" "A" "U" "L"
## [15,] "P" "L" "A" "U"
## [16,] "P" "L" "U" "A"
## [17,] "P" "U" "A" "L"
## [18,] "P" "U" "L" "A"
## [19,] "U" "A" "L" "P"
## [20,] "U" "A" "P" "L"
## [21,] "U" "L" "A" "P"
## [22,] "U" "L" "P" "A"
## [23,] "U" "P" "A" "L"
## [24,] "U" "P" "L" "A"
We can obtain the number of permutations by reading the indexed row number on the last line, or by typing the following:
nrow(paul.no.rep)
## [1] 24
How many “words” can be made from the letters in PAUL
when we allow letters to repeat?
To answer this we simply specify the repeats.allowed
argument to TRUE
.
# collection of four letter words (repeats allowed)
paul.rep <- permutations(n=4,r=4,v=word, repeats.allowed=TRUE)
nrow(paul.rep)
## [1] 256
We could also get this answer using our Fundamental Principle of Counting Theorem (AKA Multiplication of Choices rule). More specifically we have \(n_1 = 4\) choices for letter number 1, \(n_2 = 4\) choices for letter number 2, \(n_3 = 4\) choices for letter number 3, and \(n_4 = 4\) choices for letter number 4. Hence the number of words one can make from the letters in PAUl
is: \[\begin{align*}
&=\boxed{ n_1 }\cdot \boxed{ n_2}\cdot \boxed{n_3}\cdot \boxed{n_4}\\
&=\boxed{4} \cdot \boxed{4} \cdot \boxed{4}\cdot \boxed{4}\\
& = 4^4 = 256
\end{align*}\] To save space, I won’t print them all out here; instead I’ll use the head
/tail
functions to print the first/last 6 rows of the output.
head(paul.rep)
## [,1] [,2] [,3] [,4]
## [1,] "A" "A" "A" "A"
## [2,] "A" "A" "A" "L"
## [3,] "A" "A" "A" "P"
## [4,] "A" "A" "A" "U"
## [5,] "A" "A" "L" "A"
## [6,] "A" "A" "L" "L"
…
tail(paul.rep)
## [,1] [,2] [,3] [,4]
## [251,] "U" "U" "P" "P"
## [252,] "U" "U" "P" "U"
## [253,] "U" "U" "U" "A"
## [254,] "U" "U" "U" "L"
## [255,] "U" "U" "U" "P"
## [256,] "U" "U" "U" "U"
How many three letter words can be made from the letters in PAUL
provided none of the letters can repeat?
threLtrWrds <- permutations(n=4,r=3,v=word)
nrow(threLtrWrds)
## [1] 24
We could have also obtained this solution using Multiplication of Choices: \[\begin{align*} &=\boxed{n_1}\cdot \boxed{n_2}\cdot \boxed{n_3}\\ &=\boxed{4} \cdot \boxed{3} \cdot \boxed{2}\\ & = 4*3*2 = 24 \end{align*}\] Funny enough there is the same number of three letter words as four letter words without repetition found in this question! It makes sense once you realize that deleting the last letter of each of the four letter words result in a unique three letter word.
Suppose there are four scrabble tiles left in a bag: , , , . How many ways are there to select three of them for our scrabble “hand” (hence order doesn’t matter).
To solve this we would need to know the number of combinations rather than permutations. In R this can be done using:
(threeLhands <- combinations(n=4,r=3,v=word))
## [,1] [,2] [,3]
## [1,] "A" "L" "P"
## [2,] "A" "L" "U"
## [3,] "A" "P" "U"
## [4,] "L" "P" "U"
nrow(threeLhands)
## [1] 4
Note that this solution could also be obtained from calculating 4 choose 2 or \({4 choose 2}\). In R we compute this as follows:
choose(4,3)
## [1] 4
Now let’s consider the case where we have non-unique letters. That is, some of the letters in our word are repeating. We introduced some Theorems in class to help us to deal with this situation, but lets see if we can find a way do hard code this in R.
First lets save our word to variable names as we did before:
# I use the strsplit function b/c its faster than typing out
# c("O", "K", "A", ...)
word <- as.vector(strsplit("OKANAGAN", "")[[1]])
word
## [1] "O" "K" "A" "N" "A" "G" "A" "N"
OKANAGAN
(non-unique letters)How many unique “words” can be made using all the letters of OKANAGAN
exactly once?
Suppose the letters were also numbered. \(O K A_1 N_1 A_2 G A_3 N_2\). We know that the following will have repeat “words” since \(O K A_1 N_1 A_2 G A_3 N_2\) would “read” the same as
\(O K A_3 N_2 A_1 G A_2 N_1\) for example. We therefore would need to remove the duplicates in the following:
partI <- permutations(n = length(word), r= length(word), v=word, set=FALSE)
nrow(partI) # too many!
## [1] 40320
Note that we needed to set set
= FALSE
otherwise, we would have got an error because they repeated letters. To remove the duplicates we could use the unique
function.
partII <- unique(partI)
nrow(partII)
## [1] 3360
Let’s verify this solution by hand: The number of ways of arranging \(n\) objects of \(k\) different kinds of which \(n_1\) are of one kind, \(n_2\) are of the second kind, , \(n_k\) are of the \(k\)th kind, and \(n_1 +n_2 +\dots+n_k =n\) is of which \(m\) are identical is \[\frac{n!}{n_1!n_2! \cdots n_k!}\]
Hence we have \(\dfrac{8!}{3!2!} = 3360\) as we have above.
Caution: The number of combinations and permutations increases rapidly with \(n\) and \(r\). To use values of \(n\) above about 45, you will need to increase R’s recursion limit (beyond the scope of this course).
factorial(x)
), \(^nC_r\) (choose(n,r)
), it does not have a function readily available to calculate \(^nP_r\).
Create your own function called perm
to calculate \(^nP_r = \dfrac{n!}{(n-r)!}\).
Test your perm
function by calculating \(^{10}P_4\) (your answer should be 5040).
How many ways could you sample 4 marbles from the urn simultaneously? ie. without replacement and order doesn’t matter.
How many ways could you sample 4 marbles and place them in a row? i.e. from the urn without replacement (this time order matters).
library(gtools)
x <- c('red', 'blue', 'green', 'black', 'yellow', 'purple')
urn <- permutations(n=length(x),r=4,v=x)
nrow(urn)
## [1] 360
# alternatively
perm(length(x), 4)
## [1] 360
In lecture we have discussed some special discrete distributions such as the Binomial, and Poisson distribution. We have used the probability distribution function \(f(x)\) (more commonly referred to as a probability mass function, or pmf, for discrete random variables) to calculate the \(P(X=x)\) and cumulative distribution function \(F(x)\), or cdf, to calculate the \(P(X\leq x)\).
R has built-in functions to caclculate these probabilities for a number of well-known distributions. For instance, if \(X\sim\text{Binomal}(n,p)\), the \(f(x) = P(X=x)\) can be calculated using the commanddbinom(
\(x\), size=
\(n\), prob=
\(p\))
pbinom(
\(q\), size=
\(n\), prob=
\(p\))
More generally, the respective pmf and cdf function in R are computed using d
p
Recall from lecture:
A Bernoulli trial is an experiment with only two possible outcomes: success and failure. If \(X \sim \text{Bernoulli}(p)\), then
A random variable \(X\) has a , denoted by \(X\sim \mbox{Bernoulli}(p)\), if and only if it has probability mass function: \[f(x)=p^x(1-p)^{1-x} ~~~~~~\mbox{for }x=0, 1\] where \(0\leq p=P(X=1) \leq 1\).
The Binomial Probability Distribution comes about as the result of \(n\) independent Bernoulli trials, each trial having success probability \(p\). The probability of obtaining \(x\) successes in these \(n\) trials is given by \[%P(X=x)= %{^n}C_x f(x ; n, p) = {n \choose x} p^x(1-p)^{n-x}\] for \(x = 0, 1, 2, \dots,n\) where \(0 \leq p \leq 1\) and \(n\) is a non-negative integer.
A fair coin is tossed six times.
If \(X\) counts the number of heads landing heads up in six tosses of a fair coin, then \(X\sim \text{Binomial}(n = 6,p = 0.5)\). We need to find P\((X=4)\). In , this is calculated:
dbinom(4,size=6,prob=0.5)
## [1] 0.234375
We need to \(P(X\leq 4)\) for RV \(X\sim \text{Binomial}(n = 6,p = 0.5)\). In R that is calculated using:
pbinom(4,size=6,prob=0.5)
## [1] 0.890625
Notice how this is the same as calculating:
\(P(X\leq 4) = P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)\):
dbinom(0,size=6,prob=0.5) +
dbinom(1,size=6,prob=0.5) +
dbinom(2,size=6,prob=0.5) +
dbinom(3,size=6,prob=0.5) +
dbinom(4,size=6,prob=0.5)
## [1] 0.890625
As noted in the help file for dbinom
(see ?dbinom
), we can actually supply a vector for the first argument. If we supply a vector as the first argument we get a vector of the individual probabilities back. Hence the following code will give a vector containing \(P(X=0)\), \(P(X=1)\), \(P(X=2)\), \(P(X=3)\), \(P(X=4)\):
dbinom(c(0,1,2,3,4),size=6,prob=0.5)
## [1] 0.015625 0.093750 0.234375 0.312500 0.234375
If we take the sum of this output, we get yet another way of calculating \(P(X\leq 4)\):
# N.B. 0:4 creates the vector c(0,1,2,3,4)
sum(dbinom(0:4,size=6,prob=0.5))
## [1] 0.890625
Note that \(P(X < 4) = P(X \leq 3)\). That is, if there are less than 4 heads facing up, that means that there must be less than or equal 3 heads. Hence the probabilty would be calculated as:
pbinom(3,size=6,prob=0.5)
## [1] 0.65625
which is the same as\ $P(X< 4) = P(X) = P(X=0) + P(X=1) + P(X=2) + P(X=3) $:
sum(dbinom(0:3,size=6,prob=0.5))
## [1] 0.65625
This can be calculated by specifying lower.tail=FALSE
.
pbinom(4, size = 6, prob=0.5, lower.tail = FALSE)
## [1] 0.109375
This is equivalent to:
\(P(X > 4) = 1 - P(X \leq 4)\):
1 - pbinom(4, size = 6, prob=0.5)
## [1] 0.109375
which is also equivalent to:
\(P(X > 4) = P(X = 5) + P(X = 6)\):
dbinom(5,size=6,prob=0.5) + dbinom(6,size=6,prob=0.5)
## [1] 0.109375
# or
sum(dbinom(c(5,6),size=6,prob=0.5))
## [1] 0.109375
Note that \(P(X \geq 4) = P(X > 3)\). That is, if there are 4 or more heads facing up, that means that there must be strictly greater than 3 heads. Hence the probabilty would be calculated as \(P(X > 3)\):
pbinom(3, size = 6, prob=0.5, lower.tail = FALSE)
## [1] 0.34375
Which is equivalent to; \(P(X \geq 4) = 1 - P(X < 4) = 1 - P(X \leq 3)\)
1 - pbinom(3, size=6, prob=0.5)
## [1] 0.34375
Alternatively we could calculate: \(P(X \geq 4) = P(X=4) + P(X=5) + P(X=6)\)
dbinom(4,size=6,prob=0.5) + dbinom(5,size=6,prob=0.5) +
dbinom(6,size=6,prob=0.5)
## [1] 0.34375
# or
sum(dbinom(4:6,size=6,prob=0.5))
## [1] 0.34375
Recall that the Binomial distribution is simply the sum of \(n\) independent and identical Bernoulli trials. To put another way, the Bernoulli distribution is a special case of the Binomial distribution, i.e. when \(n=1\). Therefore, we can use these functions to answer Bernoulli distribution questions.
If \(X\) = 1 if a coin lands heads, and \(X\) = 0 if the coin lands tails, \(X \sim \text{Bernoulli}(p = 0.5)\). What is the probability that the coin lands tails?
Another way to say this is that \(X \sim \text{Binomial}(n = 1, p = 0.5)\). The probability that the coin lands tails, is \(P(X=0)\)
dbinom(x=0,size=1,prob=0.5)
## [1] 0.5
The poisson distribution is used when we are counting the number of events occurring in a fixed interval of space or time. Let \(X\) count the number of events occurring in a fixed interval, when events occur randomly and independently with constant rate.% \(\lambda\) Then \(X\) follows a Poisson distribution with %rate (or mean) parameter \(\lambda\) having pmf given by: \[\begin{align} f(x; \lambda) =e^{-\lambda}\frac{\lambda^x}{x!}, && x=0,1,2,\ldots \end{align}\]
where \(\lambda > 0.\) This distribution has a single rate parameter, \(\lambda\). Recall that the rate parameter \(\lambda\) will need to be consistent with the interval in the question. That is to say, if we are counting the number of occurrences in a hour, the rate should be in the form of \(\#\) per hour, if we are counting the number of occurrences in 15 minutes, our rate should be in the form of \(\#\) per 15 minutes, etc.
Customers arrive at an average rate of 3.7/hour. If the store opens at 8:00 what is the probability that there are at least two arrivals by 8:45?
If \(X\) counts the number of arrivals to the store in a 45 minute interval than \(X \sim \text{Poisson}(\lambda = 2.775)\). Notice how we convert the rate from arrivals per hour to arrivals per 45 minutes.
# 3.7/hour = x/45 minutes? x =
3.7*45/60
## [1] 2.775
\(P(X\geq 2)\) is therefore either calculated by:
1 - ppois(1, lambda = 2.775)
## [1] 0.7646307
ppois(1, lambda = 2.775, lower.tail = FALSE)
## [1] 0.7646307
1 - (dpois(0, lambda = 2.775) + dpois(1, lambda = 2.775))
## [1] 0.7646307
Find the probability that \(X\) is less than 5.
Find the probability that \(X\) is greater than 7.
Find the probability that \(X\) is greater than or equal to 4.
Find the probability that it receives 4 hits in a given minute.
Find the probability that it receives at least 100 hits in 45 minutes.