[dot] PATH: 2013

Wednesday, February 27, 2013

Parameters for Negative Binomial

The negative binomial is a useful distribution for modeling the probability that a successful event occurs after r failures in q trials.

To put this into context:

Let's say that Joe is selling candy bars in his local neighborhood. He must sell 5 candy bars to meet his quota. The chance that a household will buy his candy is 0.4. What is the probability that he sells his 5th candy bar to the 10th household.

The parameters extracted from this problem are as follows:

\[r=5=\textrm{number of failures }
\\k=5=\textrm{number of successes (aka size)}
\\p=0.4=\textrm{probability of success}
\\k+r=10=q=\textrm{total number of trials}\]

The PDF for the negative binomial follows as
\[{{r+k-1} \choose k-1} (p)^k (1-p)^r\]

The answer that follows is 0.100329, which means that Joe has about a 10% chance of selling his 5th candy bar at the 10th house. Or rephrased, it also means that Joe has 10% chance of failing 5 times and succeeding 5 times at the 10th house.

In R, you can calculate the CDF of the first 10 failures with the R code below.

Note that by the 10th house, we will have 5 failures and 5 successes.

pnbinom(1:10,5,0.4)

[1] 0.0409600 0.0962560 0.1736704 0.2665677 0.3668967 0.4672258 0.5618218

[8] 0.6469582 0.7207430 0.7827223

and the PDF

dnbinom(1:10,5,0.4)

 [1] 0.03072000 0.05529600 0.07741440 0.09289728 0.10032906 0.10032906

 [7] 0.09459597 0.08513638 0.07378486 0.06197928

Parameters for R

dnbinom(q, size, prob)
q - the number of failures
size - the number of successes
prob - the probability of success

Another quick example:

Joe decides it's time for a new job. Instead of selling candy bars he now sells notebooks. If he sells three notebooks, then he will meet his quota. However, the probability that he will sell a notebook is now 0.09. What is the probability that he sells his third notebook at the 10th house.

dnbinom(7,3,0.09)
0.013561876

Wednesday, January 16, 2013

Some more useful commands

takes the name of the file.
basename
or
cut -f1 -d '.' file.pdb

print the nth column

awk '{print $0}'

remove all dashes

sed "s/-//g"

remove new lines

awk '{printf "%s", $0}'

skips first line

sed 1d <File>

Uses the second file as a search.

cat temp.txt | grep -f temp2.txt

Moves all items in subdirectories to another folder.

find -mindepth 2 -type f -print -exec mv {} ./newDir/ \;

Print everything except the first column

awk '{first = $1; $1 = ""; print $0, first; }'

Prints out one line after the given search item

grep -A1 key $file

Remove all lines which don't begin with Y....[WC]

sed '/^Y.....[WC]/!d' AnnotatedOrfs.tab > CodingOrfs.tab

Delete All Parentheses

tr -d "\""

Take a look only at characters 17-20

cut -c 17-20

If the 17th position has a B, then delete the line

cat file.txt | sed -r '/^(.{16})B(.*)$/d'

If the 17th position has an A replace it with a space

cat file.txt | sed -r 's/^(.{16})A(.*)$/\1 \2/'