Growth5 Blog

Friday, October 16, 2009

Scraping the Web With Google Docs

Do you use Google Docs? We do - for everything. We use the "spreadsheet" functionality for budgeting, projections, time tracking, element inventory, project traffic, etc... The "Word" functionality for editing legal documents, writing marketing text and other materials.

Why is it so great? A number of reasons, but the main one is that you can have multiple people looking at the same Google Doc online at the same time from anywhere in the world, editing that document and seeing the updates of each other's work in real time.

Plus, the functionality that Google keeps updating is increasingly phenomenal. Example: one of the functions I really like is =GoogleFinance("symbol"; "attribute"). You can insert this formula into your Google Doc spreadsheet and embed stock data that will update continuously (about a 20 minute delay). We use this to publish data on "comparable" companies to our portfolio firms that can be a quick reference for anyone in the field. Google docs can be published to the web so they can be shared with anyone who has the url (they don't need to log-in to Google Docs) or the docs can be shared with just the people you want to see it.

Google realized that even with a formula like GoogleFinance, they wouldn't be able to cover all the data that someone might need, so they created a number of formulas that scrape the web. One that I am using a lot these days is called ImportHTML. This formula allows you to point to any web page and pull data from that page.

Our portfolio company, Vaccinogen, has a colon cancer vaccine currently available in Switzerland - the product is priced in Swiss Francs. GoogleFinance doesn't cover currency rates, so I use ImportHTML to grab the exchange rate for Swiss Francs (CHF) compared to the US Dollar. Once I have the exchange rate, I multiply that rate by the Swiss sales price for the vaccine to figure out what the price of the vaccine is that day in US Dollars. I saved a bookmark for this Google Doc on the home screen of my iPhone so with one click I can have the price any time I need it.

Here's how it works. To get this exchange rate, you could go to finance.yahoo.com and type in "USDCHF" and this is what would be returned.















If I want to import the "Bid" price from finance.yahoo.com into my Google Doc spreadsheet, I can enter the following formula:

=Index(ImportHTML("http://finance.yahoo.com/q?s=USDCHF=X","table",1),8,2)

I have highlighted the last three numbers of this formula in different colors because they are the most important.

1: this number indicates to use the first table that is found on the result page.

8: this number indicates to use the 8th row of that table.

2: this number indicates to use the second column of that table.

The resulting number in the Google Doc would be the most recent bid price of 1.02. Of course, if finance.yahoo.com ever changes the way their data is displayed, or the url that is used to retrieve this data, you might have to update your Google Doc, but thankfully, these type of pages don't change that often.

Did you know you can pull data from one Google Doc spreadsheet to another? I will post about that some time next week.

Labels: ,

6 Comments:

At April 6, 2010 at 12:04 AM , Blogger Scott said...

How would I import dividend yield from yahoo into Google Finance for a large quantity of symbols without typing in each symbol? For example, I want to import the symbol in field A5, B5, C5, etc as the symbol in =Index(ImportHTML("http://finance.yahoo.com/q?s=USDCHF=X","table",1),8,2)

 
At April 6, 2010 at 1:17 AM , Blogger Robert Beal said...

Scott,

Try this:

=Index(ImportHTML("http://finance.yahoo.com/q?s="&A5&"=X","table",1),8,2)

I just tested it, and it seemed to work fine. I put the symbol "USDEUR" in B5, then dragged the updated formula from column A to Column B which created the following formula which "foots" to B5:

=Index(ImportHTML("http://finance.yahoo.com/q?s="&B5&"=X","table",1),8,2)

At 1:11 am Eastern on Tues 4/6/10 the price it returns is .7451 for the USDEUR currency exchange example. Good luck with your dividend yields.

Hope this helps,

Robert

 
At October 14, 2010 at 5:59 AM , Blogger David said...

Hi, Thanks for posting this information. I have tried to implement this on my portfolio spreadsheet. I cannot however accomplish it. The best that I can do is to import the whole table.

I am trying to import the dividend yield figure from the following web page:

http://fool.digitallook.com/?action=financials&ticker=AV.&sub_action=forecasts

The result is this in my spreadsheet:

https://spreadsheets1.google.com/ccc?key=tq3bHkd-ZbO5wa7p7jO-Ryg&hl=en_GB#gid=14

when I alter the code along the lines that you suggest by putting extra numbers in representing the rows and columns I get an error

I wonder if you know where I am going wrong?

cheers

Dave

 
At October 14, 2010 at 9:51 AM , Blogger Robert Beal said...

David,

Thanks for your question. I made a copy of your Google Doc and attempted to grab the *Yield* you're looking for. I think the part of the formula you were missing was the Index portion of the formula. This allows you to choose the exact row and column of the table you're importing from.

Please try this formula:

=Index(ImportHtml("http://fool.digitallook.com/?action=forecasts&ticker=AV.","table",5),2,8)

This formula goes to the table for the symbol "AV." and grabs the data from table 5 at row 2 and column 8 which is data for "Yield". In my Google Doc, this formula returns 6.4%.

Please give this formula a shot, and let me know how it goes. If for some reason that doesn't work, there's other things we can try.

Good Luck!

-Robert

 
At February 21, 2011 at 2:39 AM , Blogger Execution Specialist said...

How can you scrape yahoo finance earnings date? Example I have a symbol in goog docs in A2 and want to know when the earnings of http://biz.yahoo.com/research/earncal/p/pcln.html

I want to know the "feb 23" date .

thx.

 
At February 21, 2011 at 11:14 PM , Blogger Robert Beal said...

Execution Specialist,

Here's one way to do it. Since we know that the url we need to yield the earnings date, we can write a formula to just simulate that url in Google Docs.

If your stock symbol is in A2, put the following formula in A3:

=left(A2)

Let's use the example of Microsoft. If you had their symbol "msft" in A2, you would now have the first letter of that symbol in A3, "m".

We need this for the following formula that you can put in A4:

=ImportHTML("http://biz.yahoo.com/research/earncal/"&A3&"/"&A2&".html","table",3)

You can see what we're doing here, we're creating the url:

http://biz.yahoo.com/research/earncal/m/msft.html

We are also simultaneously pulling in the 3rd table of that url we've created, which yields us the next earnings date for msft: April 20, 2011.

Hope this helps,

Robert

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home