Wednesday, February 27, 2013

Parameters for Negative Binomial

The negative binomial is a useful distribution for modeling the probability that a successful event occurs after r failures in q trials.

To put this into context:

Let's say that Joe is selling candy bars in his local neighborhood. He must sell 5 candy bars to meet his quota. The chance that a household will buy his candy is 0.4. What is the probability that he sells his 5th candy bar to the 10th household.

The parameters extracted from this problem are as follows:

\[r=5=\textrm{number of failures }
\\k=5=\textrm{number of successes (aka size)}
\\p=0.4=\textrm{probability of success}
\\k+r=10=q=\textrm{total number of trials}\]


The PDF for the negative binomial follows as
\[{{r+k-1} \choose k-1} (p)^k (1-p)^r\]

The answer that follows is 0.100329, which means that Joe has about a 10% chance of selling his 5th candy bar at the 10th house. Or rephrased, it also means that Joe has 10% chance of failing 5 times and succeeding 5 times at the 10th house. 

In R, you can calculate the CDF of the first 10 failures with the R code below.

Note that by the 10th house, we will have 5 failures and 5 successes.
pnbinom(1:10,5,0.4)

 [1] 0.0409600 0.0962560 0.1736704 0.2665677 0.3668967 0.4672258 0.5618218

 [8] 0.6469582 0.7207430 0.7827223
and the PDF
dnbinom(1:10,5,0.4)
 [1] 0.03072000 0.05529600 0.07741440 0.09289728 0.10032906 0.10032906
 [7] 0.09459597 0.08513638 0.07378486 0.06197928


Parameters for R

dnbinom(q, size, prob)
q - the number of failures
size - the number of successes
prob - the probability of success

Another quick example:

Joe decides it's time for a new job. Instead of selling candy bars he now sells notebooks. If he sells three notebooks, then he will meet his quota. However, the probability that he will sell a notebook is now 0.09. What is the probability that he sells his third notebook at the 10th house.

dnbinom(7,3,0.09)
0.013561876






Wednesday, January 16, 2013

Some more useful commands

takes the name of the file.
basename
or
cut -f1 -d '.' file.pdb

print the nth column

awk '{print $0}'

remove all dashes 
sed "s/-//g"

remove new lines
awk '{printf "%s", $0}' 

skips first line
sed 1d <File>

Uses the second file as a search.
cat temp.txt | grep -f temp2.txt 

Moves all items in subdirectories to another folder.
find -mindepth 2 -type f -print -exec mv {} ./newDir/ \;

Print everything except the first column
awk '{first = $1; $1 = ""; print $0, first; }' 

Prints out one line after the given search item
grep -A1 key $file

Remove all lines which don't begin with Y....[WC]
sed '/^Y.....[WC]/!d' AnnotatedOrfs.tab > CodingOrfs.tab

Delete All Parentheses
tr -d "\"" 

Take a look only at characters 17-20
cut -c 17-20 

If the 17th position has a B, then delete the line
cat file.txt | sed -r '/^(.{16})B(.*)$/d'

If the 17th position has an A replace it with a space
cat file.txt | sed -r 's/^(.{16})A(.*)$/\1 \2/'