27 February 2006

The Freshman -- nytimes [clip]

Perhaps I'll run into him at the dining hall at some point. Don't know what I'd say though!

The Freshman
Published: February 26, 2006
Sometimes walking up College Street, when the bells were ringing in Harkness Tower and the light on the gabled dorms and leafy quads made the whole campus seem part of some Platonic dream, he could almost forget that there were people back home who would be happy to kill him.His formal introduction to the terrain of the Western mind came in July at the start of the summer term; most of the class of '09 would not arrive until the fall term. He was glad for the chance to get his bearings. The direction of Mecca he knew from the compass on his watch. For local attractions he had a map of the campus; he got a cellphone, a Yale e-mail account. His student ID card admitted him to lots of campus dining halls...

ooo[clip]ooo ooo[fun]ooo ooo[Yale]ooo

The O'Hern Group | The O'Hern Group

26 February 2006

Clustering - K-means demo

Used in Data Mining (CS445)


The New York City Parades - June - JimsDeli NYC Guide

Parades in NY


FlatPrice Moving

Avi Mover


23 February 2006

Barriers to progress in systems biology -- Nature [clip]

Emphasizes the importance of standards!

Nature 438, 1079 (22 December 2005) | doi:10.1038/4381079a
Barriers to progress in systems biology
Marvin Cassman 1,a
1. Marvin Cassman lives in San Francisco, California, USA.
a Co-authors are Adam Arkin of the Bioengineering Department, University of California, Berkeley; Fumiaki Katagiri of the Department of Plant Biology, University of Minnesota, St Paul; Douglas Lauffenburger of the Biological Engineering Division, Massachusetts Institute of Technology, Cambridge; Frank J. Doyle III of the Department of Chemical Engineering, University of California, Santa Barbara; and Cynthia L. Stokes who is at Entelos, Foster City, California.
For the past half-century, biologists have been uncovering details of countless molecular events. Linking these data to dynamic models requires new software and data standards, argue Marvin Cassman and his colleagues.
The field of systems biology is lurching forwards, propelled by a mixture of faith, hope and even charity. But if it is to become a true discipline, several problems with core infrastructure (data and software) need to be addressed. In our view, they are too critical to be left to ad hoc developments by individual laboratories......

ooo[clip]ooo ooo[bioinfo]ooo ooo[networks]ooo

Here I Am Taking My Own Picture -- Times.com [clip]

Looks like this digital camera phenomena is quite widespread

Here I Am Taking My Own Picture
Published: February 19, 2006
MORGAN ADAMS, a recent college graduate, decided that her picture on her home page at MySpace.com had lingered a little too long, a full month. To snap a new one she called on the only photographer she thought she could trust: herself.
In her bedroom in Lubbock, Tex., Ms. Adams, 21, tried out a variety of poses — coy, friendly, sultry, goofy — in the kind of performance young people have engaged in privately for generations before a mirror. But Ms. Adams's mirror was a Web cam, and her journey of self-expression, documented in five digital self-portraits, was soon visible to the 56 million registered users of MySpace.


ooo[clip]ooo ooo[fun]ooo

Machine Learning textbook slides

Chap. 4 from Machine Learning Book was lecture in class


Useful lecture on Perceptron

For data mining course (CS445)


YaleInfo 2.5.1 - New Portal with UPI number


22 February 2006

MinGW - Home

useful for compiling linux programs under XP


Calories in a plain bagel [fact]

More food thoughts... found this stat, which thought was quite interesting:
290 calories with 20 from fat for a plain bagel (2g fat, 56g carbo). With 1 oz of cream cheese this goes to
390 and 120, respectively. Rather amazing how much there is in a bagel.

ooo[fact]ooo ooo[useful]ooo

21 February 2006

A short course in human relations [fact]

Found this quite interesting and succinct

A short course in human relations

the 6 most important words:
"I admit I made a mistake."

the 5 most important words:
"you did a good job."

the 4 most important words:
"what is your opinion?"

the 3 most important words:
"if you please."

the 2 most important words:
"thank you."

the 1 most important word:

the least important word:
ooo[quote]ooo ooo[useful]ooo

Map of Latitude: 41.3185 Longitude: -72.92300, by MapQuest and Google

Found out how to query mapquest and Google with lat. and longitude. Quite useful for walking the center of a map and precisely marking objects.


ooo[url]ooo ooo[fun]ooo ooo[useful]ooo

http://www.moobella.com/moobellastory.php [url]

Food dreaming...

ooo[url]ooo ooo[bioinfo]ooo

20 February 2006

Zitomer Store

Has lots of small pens


Chemical Biophysics Mini-Symposia Series - Upcoming Symposia


Reading For CPSC 445b/545b

Reading For CPSC 445b/545b

Covered by Thursday, Feb 2, 2006

Witten and Frank (or the equivalent):

Chap 1,2 3 (introduction, data precessing): pp 1-82
Chap 4, Sections 4.1 and 4.3 (classification): pp 83-88, 97-105
Chap 5, Sections 5.1, 5.2, 5.3, and 5.4 (evaluation): pp 143-152
Chap 6, Sections 6.1 and 6.2 (implementations): pp 189-213
Chap 9 (introduction to Weka): pp 345-361
Chap 10 (introduction to Weka Explorer): pp 369-414

URLs for some Two Crows Datamining Materials:


URL for Thearling overview of datamining and glossary:

URL for Article on Gene Mining:

An URL for Friedman’s Paper on “The Connection Between Statistics and Datamining”


Data mining lecture slides

Data mining lecture slides for courses based on reference text books:

Ian Witten and Eibe Frank:


Margaret Dunham:


Pang-Ning Tan, Michael Steinbach, and Vipin Kumar:


Jiawei Han and Micheline Kamber:

http://www.cs.uiuc.edu/class/fa05/cs412/schedule.htm link + bioinfo + teaching

Disprot - Database of Protein Disorder


A Comparison of Search Technology Part 1 - Beginner SEO

Overview of the technology behind the "big four" search engines (Google, Yahoo, MSN, Ask Jeeves) (From AS)


yahoo! vs. google: Bioinformatics


MIT Sloan Lecture on Logistic Regression

Used in Data Mining Course -- Data Mining Course


CSE5230: Data Mining Lectures

Used in Data Mining Course


19 February 2006

Karl Pearson - Wikipedia, the free encyclopedia

Correlation coefficient + eugenics


17 February 2006

Some bioinformatics people potentially interested in phenotypes [url]


Some bioinformatics people [url]

Some bioinformatics people

Mike Daly - http://www.broad.mit.edu/personal/mjdaly
Michael Cherry - http://genetics.stanford.edu/~cherry
Rainer Fuchs - http://bioitworldexpo.com/live/26/events/26BOS05A/conference/bio//CMONYA00B50F
Isaac Kohane - http://www.childrenshospital.org/cfapps/research/data_admin/Site113/mainpageS113P0.html
ooo[url]ooo ooo[bioinfo]ooo

13 February 2006

Boys from Brazil [clip]

Just saw this. Amazing how it anticipates the current cloning situation.

ooo[movie]ooo ooo[fun]ooo

12 February 2006

Genomic fossils as a snapshot of the human transcriptome -- PNAS [clip]

Heard of this earlier... based on screening pseudogene.org !

Proc Natl Acad Sci U S A. 2006 Jan 31;103(5):1364-9. Epub 2006 Jan 23.  
Genomic fossils as a snapshot of the human transcriptome.
Shemesh R, Novik A, Edelheit S, Sorek R.
Processed pseudogenes (PPGs) are cDNA sequences that were generated
through reverse transcription of mature, spliced mRNAs and have
subsequently been reinserted at a new genomic location. These cDNA
sequences are usually no longer transcribed and are considered "dead on
arrival." Here we show that PPGs can be used to generate a map of the
transcriptome. By analyzing thousands of human PPGs, we were able to
discover hundreds of transcript variants so far unidentified. An
experimental verification of a subset of these variants by RT-PCR
indicates that most of them are still active in the human transcriptome.
Furthermore, we demonstrate that PPGs can enable the identification of
ancient splice variants that were expressed ancestrally but are now
extinct. Our results show that the genome itself carries a "virtual cDNA
library" that can readily be used to analyze both present and ancestral
transcripts. Our approach can be applied to sequenced metazoan genomes
to computationally annotate splicing variation even when expressed
sequences are unavailable.

ooo[clip]ooo ooo[bioinfo]ooo ooo[pseudogene]ooo

Invitrogen - ProtoArray™ Home

Brad Love
PPT on this

data bleach


08 February 2006

Refining Protein Subcellular Localization -- PLoS Comput Biol [clip]

Update of Drawid & Gerstein (2000) adding in interactions as a feature and predicting subcellular localization in a finer-grain fashion.  Perhaps could have done a better comparison with past work.

Citation: Scott MS, Calafell SJ, Thomas DY, Hallett MT (2005) Refining Protein Subcellular Localization. PLoS Comput Biol 1(6): e66
The study of protein subcellular localization is important to elucidate protein function. Even in well-studied organisms such as yeast, experimental methods have not been able to provide a full coverage of localization. The development of bioinformatic predictors of localization can bridge this gap. We have created a Bayesian network predictor called PSLT2 that considers diverse protein characteristics, including the combinatorial presence of InterPro motifs and protein interaction data. We compared the localization predictions of PSLT2 to high-throughput experimental localization datasets. Disagreements between these methods generally involve proteins that transit through or reside in the secretory pathway. We used our multi-compartmental predictions to refine the localization annotations of yeast proteins primarily by distinguishing between soluble lumenal proteins and soluble proteins peripherally associated with organelles. To our knowledge, this is the first tool to provide this functionality. We used these sub-compartmental predictions to characterize cellular processes on an organellar scale. The integration of diverse protein characteristics and protein interaction data in an appropriate setting can lead to high-quality detailed localization annotations for whole proteomes. This type of resource is instrumental in developing models of whole organelles that provide insight into the extent of interaction and communication between organelles and help define organellar functionality..
ooo[clip]ooo ooo[bioinfo]ooo ooo[mining]ooo

06 February 2006

http://arep.med.harvard.edu/PGP [url]

URL to George Church's "Personal Genome Project." Looks of DB, identification and privacy issues associated with this!

ooo[url]ooo ooo[bioinfo]ooo

Four recent reviews on protein networks [clip]

Four recent reviews on protein networks (based on some research help from PMK).

1 - Nat Rev Genet. 2004 Feb;5(2):101-13. Network biology: understanding the cell's functional organization. Barabasi AL, Oltvai ZN.

2 -
A SIAM review, very in depth covering a variety of topics in networks
(more than just biological networks), community structure is covered quite well. (LONG)


3 -
Curr Opin Struct Biol. 2004 Jun;14(3):292-9.    Protein interaction networks from yeast to human. Bork P, Jensen LJ, von Mering C, Ramani AK, Lee I, Marcotte EM.
A recent review about different methods of protein interaction detection
and analysis.

4 -
The Yale perspective on networks.
ooo[clip]ooo ooo[bioinfo]ooo ooo[networks]ooo

Google's Report Card - Forbes.com


SBGrid: Structural Biology Grid


05 February 2006

HHMI: Lab Management