Ken Krugler - Files, files everywhere and not tag to see.
Vertical search engine for programmers. 500+ repositories, 20M files. 50K+ metadata repositories. 40M pages from key domains.
Add “code wikis” to talk about code.
Cannot reasonably -build- all of this code! Different languages, gcc versions, …
“DOAP” – XML-based description of
OSS projects – but not everyone uses this!
Significant UI issues – user might specify a version, but work only with latest version. Need to propagate tags as code is harvested by different projects, possibly with modifications.
Tags in different languages – but tags are often too short to automatically recognize the language.
Willing to make APIs available gratis to USPTO. (Google makes APIs available under Creative Commons license. Choose to throttle, put up with people “harvesting”.)
Expose tag values and usage to users? Folksonomy quality still a matter for research.
Tags on “what it does” vs. “how it does it”.
Fuzzy matching to relate good documentation to related code? Structure matching requires heuristics, which differ from language to language. Need things to be automated and scalable to 100K projects.