Full Time / San Francisco
About Scribd: http://www.scribd.com
Scribd is the world’s largest social reading and publishing company with more than 70 million readers every month. We've made it easy to share and discover entertaining, informative and original written content across the web and mobile devices. Scribd is where your content finds an audience, where people connect with the information that matters most to them, where the world comes to read.
Our patent-pending HTML5 conversion technology has democratized the publishing process. Now, anyone can instantly upload and transform any file -- including PDF, Word and PowerPoint -- into a web document that’s discoverable through search engines, shared on social networks and read on billions of mobile devices. And with Scribd Readcast, Facebook Instant Personalization and "like" integration, we're transitioning reading from a solitary experience to a social one. Every day, millions of people contribute to the conversations happening on Scribd by commenting, rating and Readcasting to friends on Scribd, Facebook and Twitter.
Scribd headquarters is an airy, naturally-lit loft in the heart of San Francisco's SOMA technology district. Our team is talented, passionate, motivated and above all fun. You'll often find us holding group brainstorming sessions, eating together as a team, playing winner-take-all ping-pong and pool, and celebrating our successes along the way. We offer competitive salaries, generous equity stakes, stellar benefits, flexible work hours, catered lunches and dinners, and a stocked kitchen.
As a small, San Francisco-based company, Scribd backed by some of the top investors in the world, including Charles River Ventures, Redpoint Ventures, the Kinsey Hills group, Paul Graham's Y Combinator, and several well-known angel investors. Recently named one of the World Economic Forum's 2011 Tech Pioneers at Davos, Scribd continues to have a global impact on how people read. To learn more about Scribd, visit our press page.
Are you passionate about artificial intelligence, data mining, and the web? Wondered what it would have been like to join Google in the early days?
Scribd is building a new team to tackle a still-confidential project that involves crawling a large subset of the web and mining it for structured data. This is an engineer's playground of interesting technical problems and a great opportunity to get involved at the ground floor of a new product.
The Ideal Profile
We're looking to add people with varied experience levels, from seasoned veterans of web crawling to talented engineers who can learn fast and would like to try something new.
The ideal candidate is someone who loves the web and web technologies, but rather than continuing to build their own sites is interested in helping to make the web more useful by understanding the data already there. Building a great web crawling operation is a combination of formal artificial intelligence techniques, system scalability problems, and clever heuristics with a helping of good judgement. For the right person, this should sound like a really fun problem.
Work with team to develop software that can mine structured data from the web Continually test, refine, and improve the accuracy of extraction technology Develop and scale a distributed web crawling engine Use a combination of formal and heuristic techniques to cover the largest possible set of cases with the minimum amount of effort
This position reports to Dir of Engineering.
Location: You are preferably located near San Francisco, CA. Relocation assistance is designed on a per-case basis. In short, we'll be creative to get you here.
Please send your email cover letter and resume with the subject "Your name– Engineering - web data mining – via github" to email@example.com. All communication and correspondence is held in the strictest confidence to ensure that you can connect and learn more without exposure.