Moving On.
August 19, 2008 on 12:04 pm | by ayman | In General, Media and Community, Media in Context, News | 3 CommentsEnd of summer. The beginning of the 2008 academic year. It’s a time when we start to think about our accomplishments and ask ourselves ‘what’s next?’. To date, Yahoo! Research Berkeley rocked. We released several project prototypes, published work in major conferences, and gave talks around the world. Most importantly, our work has now become part of what’s next for Yahoo! - which jives with what our lab set out to do in its charter. Zync is now a key video sharing component in Yahoo! Messenger. The ZoneTag cell spotting is incorporated into Y!Go 3.0. We transformed Fire Eagle into reality (and watched it launch publicly last week). And there’s more on the way.
Part of graduation is moving on. While our work continues to filter into the real world, so do we. The Yahoo! Research Berkeley team has been largely assimilated into different parts of Yahoo!. You can find our “graduates” and “alumni” all over the place: from Yahoo! Brickhouse to Yahoo! Research to Flickr, Yahoo! Mobile, and Yahoo! Groups.
As our work here is done, this blog will no longer be active. We don’t want to leave you without a fair share of entertainment, so let us list a few choices. You can check out new things that are happening in research at Yahoo! You can also follow us individually on various fronts. Ayman and Naaman take their show on the road right here with rants and complaints galore. The Racing Geek is posting his thoughts on technology (and racing). Some of our super-interns are posting their thoughts as well - check out Ryan, Dean, Chris (we are still learning from them!).
Finally, you can continue to follow some of our projects via a new venue; some of you have seen us posting there already.
See you in the future!
All of us on the Y!RB Team.
Keeping Busy
January 23, 2008 on 2:30 pm | by Mor | In General, News | 1 CommentHey there, readers - we’re still here. We’ve just been busy making all the things we worked on at the lab real in some way or another (and yes, Fire Eagle is coming soon). So we’re not blogging every few minutes like some other sites… but we did write a few interesting papers in the meantime (come see us at CHI 2008 and WWW 2008).
We’ll tell you more soon… stay tuned. And for more frequent entertainment, click Next!
Why do we write?
September 20, 2007 on 9:11 am | by Mor | In General | 3 CommentsPhew. The CHI 2008 paper deadline was yesterday, and our people where involved in a total of five different papers. That’s a lot of work, and a lot of time. Sometimes we need to remind ourselves why we do it.
Indeed, people inside and outside the Yahoo! organization often ask “Why do we publish academic papers?”. On its face, it would seem like writing papers is not only a waste of corporate time and money, but may also expose techniques and valuable knowledge and insights to competitors. Indeed, other companies (that shall remain nameless) had generally not encouraged participation in academic discourse by their (sometimes) brilliant researchers. Luckily, in our Advanced Development Research lab (aka Y!RB), the approach has been a little different. While we are not under constant pressure to publish papers, it is certainly encouraged and expected that we do.
Asking questions like ‘why do we publish’ usually grows into a much larger and philosophical question of “What is research?” Let’s not go there just quite yet – more on that later.
To me, there are a number of reasons that make academic papers a worthwhile endeavor (warning: personal opinions follow; the list below and the words above do not represent the views of Yahoo!, Yahoo! Research, Yahoo! Advanced Development Division, or my dog Mingus).
1. Writing makes you organize your thoughts. It’s akin to authoring a presentation; one that you are required to submit in advance and that can be rejected. To write a research paper that will get accepted to a top conference, you need to crystallize your thoughts, express your hypothesis and claims clearly, and be able to show significant results. The process forces you to do a better job in understanding, situating, evaluating and defending your work and its different components. I cannot tell you how many times we have started writing a paper just to realize, based on the initial writing, that we are doing something wrong (or not well enough).
2. Publishing is the best way to get feedback that in turn can validate and improve your work. At the basic level, getting a paper into a major conference serves as external validation that your work is worthwhile. You can be working in the dark for years and keep patting yourself on the back - but convincing the reviewers of a major conference or journal that your work is important and interesting is a mark of success that can be trusted (let’s assume a perfect reviewing system for the moment - this is certainly not often the case). At a different level, feedback from reviewers, people who read your work, or (more often) from those who attended your conference presentation, can greatly improve your work. People are often keen, perhaps too keen (argh!), to give you ideas related to your presented work or how to do it better.
3. Publishing papers gives back to the community and facilitates and invigorates academic discourse. Other than warm fuzzies, this gives you a chance to make an impact that exceeds the boundaries of your organization. Of course, the contribution gets you - and your company - credit as good citizens in the academic community.
4. Publishing is an opportunity to steer and inform the research community about a direction in which you are invested. Simply by writing about a new research problem, there is a chance that other researchers will become interested and start looking at the same domain. Such a chance to transform and inspire other brilliant researchers requires a well-thought of problem definition and some initial attempt at tackling it (i.e., a research paper).
5. [This is a weird one] In a large company, publishing a paper in a conference can be the first time when the relevant people in the company are exposed to your work. As absurd as this may sound, in a large organization, internal communication is as difficult as one may imagine. A conference brings people with the same interests together and they will find you instead of you having to find them. For example, our CHI 2007 papers had little exposure internally at Yahoo! before the conference. Many of our UED and UER people were exposed to this work at the conference, and a much wider internal discussion followed. This is not a Yahoo! thing. I have also heard stories from friends in other research labs that had their research “discovered” by product teams when presented in public conferences.
6. Publishing leads to recruiting from academia. Nothing tells exceptional students (and faculty) that Yahoo! is the best place for them like a brilliantly delivered presentation of deep and thoughtful ideas. And sometimes, even our presentations are enough to attract such interest. At least three researchers and interns in our lab are here mostly because they have seen a researcher speak at a conference or another venue about our work; countless other CVs were received.
7. Writing can be an outlet of creativity. We’re not all Dick Bulterman (see here for example), but at least we can pretend…
Anything I missed?
Dilbert Author Invents ZoneTag
August 28, 2007 on 9:19 am | by Mor | In Media in Context, Mobile, News, ZoneTag | 1 CommentFrom a recent post by Scott Adams on his Dilbert Blog:
First, my digital camera should have GPS so it always knows where I am. When I download my photos, a Google map would pop up, and the photos would go into storage according to the points on the map where the pictures were taken, ordered by date. The map forms the backdrop for organizing the scrapbook.
Second, I would use a special credit card for all purchases on my vacation, from gas stations to hotels to restaurants. The special part is that the records of my purchases would feed into my automatic scrapbook software and coordinate it with the camera’s GPS data. That would be enough data for the scrapbook system to intelligently guess the name of the restaurant or attraction where I was at the time of the picture.
Third, the system needs face recognition software so it can label photos with at least the names of family and friends who appear in them. It doesn’t need to be 100% accurate, but it could give you a big head start.
Minus the face recognition (which ZoneTag compensates for by suggesting tags based on your history and the tag’s likelihood, which often gets the names of the people in your images), we’re already there. And there’s no need to get into your credit report, Scott! ZoneTag knows where you are and will show you names of restaurants, landmarks and attractions around you as you take a photo. Just click, and it’s captures.
Get a Nokia N95, Scott, and start ZoneTagging. If you need help setting up, feel free to drop us a line. We’ll see what we can do (an originally-signed strip of Dilbert will get you up and running — and you know what — we’ll send you the phone pre-installed as well…).
Flickr Fountain of Knowledge
July 31, 2007 on 10:00 am | by Mor | In General, Media in Context, Social Media | 1 CommentWhat can we learn from Flickr? Well, for one, we have learned that there are a lot of people who like to take photographs and share them publicly. Who would have guessed! However, my question refers to a different type of knowledge: information about the world that is implicitly encoded in the activity on Flickr.
You do not need to go far to see a simple yet brilliant example of such knowledge: check out Flickr’s tag clusters (here are the clusters for love, jaguar, Taj Mahal, hack). Using tag co-occurrence on Flickr photos, Flickr’s clustering can break down a term into multiple semantics or meanings: Jaguar, for example, is the animal as well as the car and the guitar: the first co-occurs with the tags “zoo” and “cat”; the second meaning of “jaguar” appears with “car” and “auto”. Note that these meanings are not mined from any other resource: they represent some “knowledge” that is generated automatically from the implicit contributions of Flickr users uploading and tagging their photos.
In other examples, Patrick Schmitz developed a different co-occurrence model that allowed him to generate subsumption data in Flickr tags (e.g. San Francisco is subsumed by California). The work at Yahoo! Research on TagLines and at our own lab on Tag Maps had shown that Flickr community activity generates descriptive labels for events and locations.
Last week, in Amsterdam, as part of SIGIR 2007, we added yet another method of extracting knowledge from Flickr. The paper, “Towards Automatic Extraction of Event and Place Semantics from Flickr Tags”, by Tye Rattenbury, Nathan Good (two of our star interns) and myself*, begins to answer a simple question: given a tag that appears on Flickr (such as “dog”, “SIGIR 2007″, or “Yahoo! Research Berkeley”), can we automatically determine whether or not that tag refers to a specific place, and whether or not the tag refers to a specific event? As you may guess, SIGIR 2007 refers to an event, Yahoo! Research Berkeley is a place, and “dog” is neither a place not an event.
Knowing if a tag is a place or event leads to better image search, but can also help us to better visualize the Flickr data; generate automatic event and place gazetteers; associate missing time/location metadata based on tags, and more.
I will not get into the details of how we propose to do extract the place/event knowledge from Flickr; you can get these details in our paper (pdf). I will just mention that we are using the dataset of geotagged Flick photos, and looking at the time and location distributions for each individual tag in the dataset. If the location or time distribution for a tag have specific “structure” to them, we classify that tag as a place or event, accordingly.
Below, you can follow the presentation slides I gave at SIGIR, or just jump directly to the paper to get the full story.
While the debate on the “Is the semantic web is dead?” question continues, “emerging semantics” are alive and kicking. What other knowledge can be extracted from the Flickr dataset?
* “Towards” is a code word in research papers meaning “we didn’t take the research all the way quite yet but want to make the paper sound important nevertheless” - we try not use it too much.
Copyright © 2008 Yahoo! Inc. All rights reserved. Privacy Policy - Terms of Service - Login
Powered by WordPress on Yahoo! Web Hosting.