RObamaa competition for Pittsburgh’s R programmers

Rules

Must be a Pittsburgh useR group member to compete, but can collaborate with non-members. (Open to students, teachers, researchers, professionals, and hobbyists.)
Can work alone or in a team (recommended) of at most three people.
Must only use documents we provide. May use a subset of the documents. Submissions employing additional data sources will be disqualified.
Can use any^✭ model, ranging from simple Markov chains to more complicated ones aware of grammar rules and how language actually works.
Winning individual or team will present on their work.
Must be respectful and not use this as a platform for hate speech or expressing political views.

^✭ While generating speech via randomly sampled blocks of text from the provided documents is, technically, a valid approach (albeit not an impressive one), the caveat is that each block must not be longer than one sentence.

Data

To train your algorithms, we have prepared 8 text documents containing a selection of Obama’s words from various speeches and interviews. The documents are available as a 82KB ZIP archive.

Submission

Each submission must be emailed to Mikhail Popov (mikhail@mpopov.com) by 11:59pm on May 20th as a single R script that, when run, will use the documents we provided (above) to create a self-contained function speech. This function must generate a n sentences-long Obama speech.

speech must not rely on any objects outside of it. All objects that are not speech will be removed from the global environment before speech is used to generate text. See instructions below for creating a self-contained function.

Example

speech(3)

## [1] "There are only so many shortcuts."         
## [2] "Ultimately, we have to change the law."    
## [3] "And people have to remain focused on that."

Evaluation

The submissions will be evaluated by the Pittsburgh useR group organizers (for code readability, performance, and memory usage) and competing teams (blind peer review).

Weighted Scoring

Code Readability (5%)
All submissions will be uploaded to the Pittsburgh useR group repository on GitHub for groups members to learn from, so formatting and commenting are important. See Hadley Wickham’s style guide for suggestions on writing readable R code.

Note: participants may request to have their submission published anonymously.
Training Performance (15%)
We will measure the speed of importing the provided documents, cleaning up and manipulating the data, and training the algorithm.
Memory Usage (15%)
A more compact speech object will score higher.
Data Generative Performance (15%)
We will measure the speed of generating a speech.
Blind Peer Review (50%)
The most important deciding factor will be the competing teams themselves. We will generate a short speech using the same seed for each group. Each generated text will be run through a text-to-speech engine and saved as a randomly numbered mp3 file. The audio files will be sent out to the teams who will rank them from least believable to most believable, without knowing which audio file belongs to which team.

Prize

A one (1) year subscription to shinyapps.io Standard Edition (a $1,100 value): Unlimited Applications, 1,000 Active Hours, Authentication, Multiple Instances, and Email Support.

Runners up will get a variety of prizes, including Hands-On Programming with R by Garrett Grolemund.

RStudio is a trademark of RStudio, Inc.

Winners

We received a single submission from Taylor Pospisil and Lee Richardson, which can be found at https://github.com/Pittsburgh-useR-Group/RObama/tree/master/submission.

Instructions for Creating Self-Contained Functions

You can use the following template and accompanying example for creating a function that contains all the data and models it needs:

make <- function(Object1,Object2) {
  force(Object1)
  force(Object2)
  return(function(n=NULL) {
    obj1 <- get('Object1',environment())
    obj2 <- get('Object2',environment())
    str(obj1)
    str(obj2)
    ls()
  })
}

Let’s see it in action:

set.seed(0)
str(x <- rnorm(10))

##  num [1:10] 1.263 -0.326 1.33 1.272 0.415 ...

str(y <- rnorm(10))

##  num [1:10] 0.764 -0.799 -1.148 -0.289 -0.299 ...

speech <- make(x,y)
rm(x,y,make)
ls()

## [1] "speech"

speech()

##  num [1:10] 1.263 -0.326 1.33 1.272 0.415 ...
##  num [1:10] 0.764 -0.799 -1.148 -0.289 -0.299 ...

## [1] "n"    "obj1" "obj2"

RObama
a competition for Pittsburgh’s R programmers

Objective