1 Introduction

This lab will introduce functions related to combinatorics, along with functions for calculating probabilities for some of the special discrete distributions discussed in lecture. Exercise are created for practice purposes only and will not be graded. These questions will require you to know the difference between a permutation and a combination. Recall that we care about the order of the elements with permutations; with combinations we don’t.

1.1 Packages

While R serves as a very powerful program in its base form, an ever-growing collection of packages are available to increase its capabilities. R package’s are delevoped and shared by the R community and can include a collection of functions, code, data, documentation, and tests. For instance, the package called gtools includes functions to enumerate permutations and combinations. To make use of it, we need to first need to download and install it into R.

The Comprehensive R Archive Network (CRAN) is the official and most popular repository for R packages. Packages published here need to pass a series of tests before they are accepted which makes it more reliable than random peices of code you find on the interwebs. If the package you are looking for is stored on CRAN (and chances are it is), then you can use the following code to download it into your R library.

install.packages("gtools")  # only needs to be run once in your lifetime
library("gtools")           # needs to be run every time you reopen R

The Rf{install.packages} function downloads the pacakge {} to your R library. Think about this as buying a book; once you buy it once, you don’t need to buy it again. The Rf{library}Rg{()} function loads this pacakge into your R session so you can use all the functions it contains. You can think about this step as taking the book down from the shelf and viewing its contents. Assuming you put the book back on the shelf every time you finished reading, then each time you want to reference it, you’ll again have to retreive it from the bookshelf. To view the documention of a package, you can type:

help(package="< packagename >")

In the next section we’ll make use of some of the functions in {}.

1.2 User Defined Functions

While R has a huge selection of functions to choose from, inevitablly we may need to perform a specific task that is availabe in function-from in R. Hence it will be useful to learn how to create a user defined function.

In R, the user can create functions using the general form:

functionName <- function(argument1, argument2=default_value, ...) {
  # comments
  [Statements]
  ...
  return(<something we calculated above>)
}

Here functionName is the name of the function (we need to follow the same rules for naming functions as we did for naming variables; see lab 1) with any arguments (i.e. inputs) the function takes specified in parenthesis. For example, we might define a function called add that takes an input argument x and adds it to the second argument y as follows:

add <- function(x,y){
  x + y 
}

add(2,4)
add(2) # y is missing and will produce and error
add(2,4,3) # too many arguments, will produce and error

If we wanted to, we could give y, for example, a default value. That is, if the user didn’t specify y we would take it to be its default value of 100, say.

add <- function(x,y=100){
  x + y 
}
add(2)
## [1] 102

As a default, R will just print out the last line of the function but its always best to explicity state what you want the funtion to output using the Rf{return} function.

add <- function(x,y){
  out <- x + y 
  return(out)
}
add(2,4) # same as before
## [1] 6

If for some reason I wanted to output x and y as well, I could do so doing:

add <- function(x,y){
  out <- x + y 
  return(list(out,x,y))
}
add(2.5, -8)
## [[1]]
## [1] -5.5
## 
## [[2]]
## [1] 2.5
## 
## [[3]]
## [1] -8

2 Combinatorics

We have learned some helpful combinatorics rules that allow us to count things more efficiently. Let’s have a look at how we can count these in R. The function permutations and combinations are available in the gtool package in R. Once you have loaded this into your R session you can obtain their help files as you would any other function in R (?permutations or ?combinations).

2.1 Example PAUL

Let’s look at a simple example of rearrange the letters in PAUL. First lets store the individual letters in a vector object called word.

word <- c("P","A","U","L")

2.1.1 4-letter words, no repetition

How many “words” can we create from the letters in PAUL using each letter exactly once?

In class we got the solution (\(=4!\)). Note that factorials are calculated in R using the factorial function:

factorial(4)
## [1] 24

Try to get at this answer using the permutation function.

# collection of four letter words (no repeats)
(paul.no.rep <- permutations(n=4,r=4,v=word))
##       [,1] [,2] [,3] [,4]
##  [1,] "A"  "L"  "P"  "U" 
##  [2,] "A"  "L"  "U"  "P" 
##  [3,] "A"  "P"  "L"  "U" 
##  [4,] "A"  "P"  "U"  "L" 
##  [5,] "A"  "U"  "L"  "P" 
##  [6,] "A"  "U"  "P"  "L" 
##  [7,] "L"  "A"  "P"  "U" 
##  [8,] "L"  "A"  "U"  "P" 
##  [9,] "L"  "P"  "A"  "U" 
## [10,] "L"  "P"  "U"  "A" 
## [11,] "L"  "U"  "A"  "P" 
## [12,] "L"  "U"  "P"  "A" 
## [13,] "P"  "A"  "L"  "U" 
## [14,] "P"  "A"  "U"  "L" 
## [15,] "P"  "L"  "A"  "U" 
## [16,] "P"  "L"  "U"  "A" 
## [17,] "P"  "U"  "A"  "L" 
## [18,] "P"  "U"  "L"  "A" 
## [19,] "U"  "A"  "L"  "P" 
## [20,] "U"  "A"  "P"  "L" 
## [21,] "U"  "L"  "A"  "P" 
## [22,] "U"  "L"  "P"  "A" 
## [23,] "U"  "P"  "A"  "L" 
## [24,] "U"  "P"  "L"  "A"

We can obtain the number of permutations by reading the indexed row number on the last line, or by typing the following:

nrow(paul.no.rep)
## [1] 24


2.1.2 4-letter words, repetition allowed

How many “words” can be made from the letters in PAUL when we allow letters to repeat?

To answer this we simply specify the repeats.allowed argument to TRUE.

# collection of four letter words (repeats allowed)
paul.rep <- permutations(n=4,r=4,v=word, repeats.allowed=TRUE)
nrow(paul.rep)
## [1] 256

We could also get this answer using our Fundamental Principle of Counting Theorem (AKA Multiplication of Choices rule). More specifically we have \(n_1 = 4\) choices for letter number 1, \(n_2 = 4\) choices for letter number 2, \(n_3 = 4\) choices for letter number 3, and \(n_4 = 4\) choices for letter number 4. Hence the number of words one can make from the letters in PAUl is: \[\begin{align*} &=\boxed{ n_1 }\cdot \boxed{ n_2}\cdot \boxed{n_3}\cdot \boxed{n_4}\\ &=\boxed{4} \cdot \boxed{4} \cdot \boxed{4}\cdot \boxed{4}\\ & = 4^4 = 256 \end{align*}\] To save space, I won’t print them all out here; instead I’ll use the head/tail functions to print the first/last 6 rows of the output.

head(paul.rep)
##      [,1] [,2] [,3] [,4]
## [1,] "A"  "A"  "A"  "A" 
## [2,] "A"  "A"  "A"  "L" 
## [3,] "A"  "A"  "A"  "P" 
## [4,] "A"  "A"  "A"  "U" 
## [5,] "A"  "A"  "L"  "A" 
## [6,] "A"  "A"  "L"  "L"

tail(paul.rep)
##        [,1] [,2] [,3] [,4]
## [251,] "U"  "U"  "P"  "P" 
## [252,] "U"  "U"  "P"  "U" 
## [253,] "U"  "U"  "U"  "A" 
## [254,] "U"  "U"  "U"  "L" 
## [255,] "U"  "U"  "U"  "P" 
## [256,] "U"  "U"  "U"  "U"


2.1.3 3-letter words repetition allowed

How many three letter words can be made from the letters in PAUL provided none of the letters can repeat?

threLtrWrds <- permutations(n=4,r=3,v=word)
nrow(threLtrWrds)
## [1] 24

We could have also obtained this solution using Multiplication of Choices: \[\begin{align*} &=\boxed{n_1}\cdot \boxed{n_2}\cdot \boxed{n_3}\\ &=\boxed{4} \cdot \boxed{3} \cdot \boxed{2}\\ & = 4*3*2 = 24 \end{align*}\] Funny enough there is the same number of three letter words as four letter words without repetition found in this question! It makes sense once you realize that deleting the last letter of each of the four letter words result in a unique three letter word.


2.1.4 3-letter hands

Suppose there are four scrabble tiles left in a bag: , , , . How many ways are there to select three of them for our scrabble “hand” (hence order doesn’t matter).

To solve this we would need to know the number of combinations rather than permutations. In R this can be done using:

(threeLhands <- combinations(n=4,r=3,v=word))
##      [,1] [,2] [,3]
## [1,] "A"  "L"  "P" 
## [2,] "A"  "L"  "U" 
## [3,] "A"  "P"  "U" 
## [4,] "L"  "P"  "U"
nrow(threeLhands)
## [1] 4

Note that this solution could also be obtained from calculating 4 choose 2 or \({4 choose 2}\). In R we compute this as follows:

choose(4,3)
## [1] 4


2.2 Example OKANAGAN

Now let’s consider the case where we have non-unique letters. That is, some of the letters in our word are repeating. We introduced some Theorems in class to help us to deal with this situation, but lets see if we can find a way do hard code this in R.

First lets save our word to variable names as we did before:

# I use the strsplit function b/c its faster than typing out
# c("O", "K", "A", ...)
word <- as.vector(strsplit("OKANAGAN", "")[[1]])
word
## [1] "O" "K" "A" "N" "A" "G" "A" "N"

2.2.1 permute OKANAGAN (non-unique letters)

How many unique “words” can be made using all the letters of OKANAGAN exactly once?

Suppose the letters were also numbered. \(O K A_1 N_1 A_2 G A_3 N_2\). We know that the following will have repeat “words” since \(O K A_1 N_1 A_2 G A_3 N_2\) would “read” the same as
\(O K A_3 N_2 A_1 G A_2 N_1\) for example. We therefore would need to remove the duplicates in the following:

partI <- permutations(n = length(word), r= length(word), v=word, set=FALSE)
nrow(partI) # too many!
## [1] 40320

Note that we needed to set set = FALSE otherwise, we would have got an error because they repeated letters. To remove the duplicates we could use the unique function.

partII <- unique(partI)
nrow(partII)
## [1] 3360

Let’s verify this solution by hand: The number of ways of arranging \(n\) objects of \(k\) different kinds of which \(n_1\) are of one kind, \(n_2\) are of the second kind, , \(n_k\) are of the \(k\)th kind, and \(n_1 +n_2 +\dots+n_k =n\) is of which \(m\) are identical is \[\frac{n!}{n_1!n_2! \cdots n_k!}\]

Hence we have \(\dfrac{8!}{3!2!} = 3360\) as we have above.


Caution: The number of combinations and permutations increases rapidly with \(n\) and \(r\). To use values of \(n\) above about 45, you will need to increase R’s recursion limit (beyond the scope of this course).

2.3 Exercises

  1. While R has a built in function to calculate factorials (factorial(x)), \(^nC_r\) (choose(n,r)), it does not have a function readily available to calculate \(^nP_r\).
    1. Create your own function called perm to calculate \(^nP_r = \dfrac{n!}{(n-r)!}\).

    2. Test your perm function by calculating \(^{10}P_4\) (your answer should be 5040).

  2. Consider an urn with 6 marbles: one red, one blue, one green, one black, one yellow, one purple.
    1. How many ways could you sample 4 marbles from the urn simultaneously? ie. without replacement and order doesn’t matter.

    2. How many ways could you sample 4 marbles and place them in a row? i.e. from the urn without replacement (this time order matters).

    library(gtools)
    x <- c('red', 'blue', 'green', 'black', 'yellow', 'purple')
    urn <- permutations(n=length(x),r=4,v=x)
    nrow(urn)
    ## [1] 360
    # alternatively
    perm(length(x), 4)
    ## [1] 360

3 Discrete Distribution in R

In lecture we have discussed some special discrete distributions such as the Binomial, and Poisson distribution. We have used the probability distribution function \(f(x)\) (more commonly referred to as a probability mass function, or pmf, for discrete random variables) to calculate the \(P(X=x)\) and cumulative distribution function \(F(x)\), or cdf, to calculate the \(P(X\leq x)\).

R has built-in functions to caclculate these probabilities for a number of well-known distributions. For instance, if \(X\sim\text{Binomal}(n,p)\), the \(f(x) = P(X=x)\) can be calculated using the command
dbinom(\(x\), size=\(n\), prob=\(p\))
and \(F(x) = P(X\leq x)\) can be caclculated using the command
pbinom(\(q\), size=\(n\), prob=\(p\))

More generally, the respective pmf and cdf function in R are computed using d, and p, where is the short-form distribution name. A summary for a few covered in class are provide below:

3.1 Binomial Distribution

Recall from lecture:

  • A Bernoulli trial is an experiment with only two possible outcomes: success and failure. If \(X \sim \text{Bernoulli}(p)\), then

    • \(P(\text{success})=P(X=1)=p\), and
    • \(P(\text{failure})=P(X=0)=1-p\).
  • A random variable \(X\) has a , denoted by \(X\sim \mbox{Bernoulli}(p)\), if and only if it has probability mass function: \[f(x)=p^x(1-p)^{1-x} ~~~~~~\mbox{for }x=0, 1\] where \(0\leq p=P(X=1) \leq 1\).

  • The Binomial Probability Distribution comes about as the result of \(n\) independent Bernoulli trials, each trial having success probability \(p\). The probability of obtaining \(x\) successes in these \(n\) trials is given by \[%P(X=x)= %{^n}C_x f(x ; n, p) = {n \choose x} p^x(1-p)^{n-x}\] for \(x = 0, 1, 2, \dots,n\) where \(0 \leq p \leq 1\) and \(n\) is a non-negative integer.

3.1.1 Fair coin, count Heads

A fair coin is tossed six times.

3.1.1.1 What is the probability of obtaining exactly four heads?

If \(X\) counts the number of heads landing heads up in six tosses of a fair coin, then \(X\sim \text{Binomial}(n = 6,p = 0.5)\). We need to find P\((X=4)\). In , this is calculated:

dbinom(4,size=6,prob=0.5)
## [1] 0.234375


3.1.1.2 Find the probability that at most four heads land face up.

We need to \(P(X\leq 4)\) for RV \(X\sim \text{Binomial}(n = 6,p = 0.5)\). In R that is calculated using:

pbinom(4,size=6,prob=0.5)
## [1] 0.890625

Notice how this is the same as calculating:
\(P(X\leq 4) = P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)\):

dbinom(0,size=6,prob=0.5) +
  dbinom(1,size=6,prob=0.5) +
  dbinom(2,size=6,prob=0.5) +
  dbinom(3,size=6,prob=0.5) +
  dbinom(4,size=6,prob=0.5) 
## [1] 0.890625

As noted in the help file for dbinom (see ?dbinom), we can actually supply a vector for the first argument. If we supply a vector as the first argument we get a vector of the individual probabilities back. Hence the following code will give a vector containing \(P(X=0)\), \(P(X=1)\), \(P(X=2)\), \(P(X=3)\), \(P(X=4)\):

dbinom(c(0,1,2,3,4),size=6,prob=0.5)
## [1] 0.015625 0.093750 0.234375 0.312500 0.234375

If we take the sum of this output, we get yet another way of calculating \(P(X\leq 4)\):

# N.B. 0:4 creates the vector c(0,1,2,3,4)
sum(dbinom(0:4,size=6,prob=0.5)) 
## [1] 0.890625


3.1.1.3 Find the \(P(X < 4)\).

Note that \(P(X < 4) = P(X \leq 3)\). That is, if there are less than 4 heads facing up, that means that there must be less than or equal 3 heads. Hence the probabilty would be calculated as:

pbinom(3,size=6,prob=0.5)
## [1] 0.65625

which is the same as\ $P(X< 4) = P(X) = P(X=0) + P(X=1) + P(X=2) + P(X=3) $:

sum(dbinom(0:3,size=6,prob=0.5))
## [1] 0.65625


3.1.1.4 Find \(P(X > 4)\).

This can be calculated by specifying lower.tail=FALSE.

pbinom(4, size = 6, prob=0.5, lower.tail = FALSE)
## [1] 0.109375

This is equivalent to:
\(P(X > 4) = 1 - P(X \leq 4)\):

1 - pbinom(4, size = 6, prob=0.5)
## [1] 0.109375

which is also equivalent to:
\(P(X > 4) = P(X = 5) + P(X = 6)\):

dbinom(5,size=6,prob=0.5) + dbinom(6,size=6,prob=0.5)
## [1] 0.109375
# or
sum(dbinom(c(5,6),size=6,prob=0.5))
## [1] 0.109375


3.1.1.5 Find \(P(X \geq 4)\).

Note that \(P(X \geq 4) = P(X > 3)\). That is, if there are 4 or more heads facing up, that means that there must be strictly greater than 3 heads. Hence the probabilty would be calculated as \(P(X > 3)\):

pbinom(3, size = 6, prob=0.5, lower.tail = FALSE)
## [1] 0.34375

Which is equivalent to; \(P(X \geq 4) = 1 - P(X < 4) = 1 - P(X \leq 3)\)

1 - pbinom(3, size=6, prob=0.5)
## [1] 0.34375

Alternatively we could calculate: \(P(X \geq 4) = P(X=4) + P(X=5) + P(X=6)\)

dbinom(4,size=6,prob=0.5) + dbinom(5,size=6,prob=0.5) + 
  dbinom(6,size=6,prob=0.5)
## [1] 0.34375
# or
sum(dbinom(4:6,size=6,prob=0.5))
## [1] 0.34375


Recall that the Binomial distribution is simply the sum of \(n\) independent and identical Bernoulli trials. To put another way, the Bernoulli distribution is a special case of the Binomial distribution, i.e. when \(n=1\). Therefore, we can use these functions to answer Bernoulli distribution questions.

3.1.1.6 Bernoulli answered with Binomial

If \(X\) = 1 if a coin lands heads, and \(X\) = 0 if the coin lands tails, \(X \sim \text{Bernoulli}(p = 0.5)\). What is the probability that the coin lands tails?

Another way to say this is that \(X \sim \text{Binomial}(n = 1, p = 0.5)\). The probability that the coin lands tails, is \(P(X=0)\)

dbinom(x=0,size=1,prob=0.5)
## [1] 0.5


3.2 Poisson Distribution

The poisson distribution is used when we are counting the number of events occurring in a fixed interval of space or time. Let \(X\) count the number of events occurring in a fixed interval, when events occur randomly and independently with constant rate.% \(\lambda\) Then \(X\) follows a Poisson distribution with %rate (or mean) parameter \(\lambda\) having pmf given by: \[\begin{align} f(x; \lambda) =e^{-\lambda}\frac{\lambda^x}{x!}, && x=0,1,2,\ldots \end{align}\]

where \(\lambda > 0.\) This distribution has a single rate parameter, \(\lambda\). Recall that the rate parameter \(\lambda\) will need to be consistent with the interval in the question. That is to say, if we are counting the number of occurrences in a hour, the rate should be in the form of \(\#\) per hour, if we are counting the number of occurrences in 15 minutes, our rate should be in the form of \(\#\) per 15 minutes, etc.

3.2.0.1 Store Example

Customers arrive at an average rate of 3.7/hour. If the store opens at 8:00 what is the probability that there are at least two arrivals by 8:45?

If \(X\) counts the number of arrivals to the store in a 45 minute interval than \(X \sim \text{Poisson}(\lambda = 2.775)\). Notice how we convert the rate from arrivals per hour to arrivals per 45 minutes.

# 3.7/hour = x/45 minutes? x = 
3.7*45/60
## [1] 2.775

\(P(X\geq 2)\) is therefore either calculated by:

  • \(P(X\geq 2) = 1 - P(X < 1)\)
1 - ppois(1, lambda = 2.775)
## [1] 0.7646307
  • \(P(X\geq 2) = P(X > 1)\)
ppois(1, lambda = 2.775, lower.tail = FALSE)
## [1] 0.7646307
  • \(P(X\geq 2) = 1 - [P(X = 0) + P(X = 1)]\)
1 - (dpois(0, lambda = 2.775) + dpois(1, lambda = 2.775))
## [1] 0.7646307
  • Notice how we can’t calculate \(P(X \geq 2) = P(X=2) + P(X=3) + ...\) since the support for \(X\) is countably infinite, i.e. {0,1,2,3,}


3.3 Exercises

  1. Let \(X\sim\text{Binomial}(n=9,p=1/3)\).
    1. Find the probability that \(X\) is less than 5.

    2. Find the probability that \(X\) is greater than 7.

    3. Find the probability that \(X\) is greater than or equal to 4.

  2. A website receives 120 hits per hour.
    1. Find the probability that it receives 4 hits in a given minute.

    2. Find the probability that it receives at least 100 hits in 45 minutes.