Web scraping for Weather information

31 12 2005

This is the first time I use python to scrape web information. I want to write some experiences for that.

 1. About Regular Expression:
   [^>]*: anything in which begins with font;
   \(: use back dash \ following any symbol you want to match (i.e., here ( );
  [^\n]*: means to change to a new line at the end of a line;
  [\w][\w] or [\w]{1,2}: means to match one or two words;
  [\w]* or [\w]+: means >=0 or >=1;
  ([A-Z][A-Z]?): one or two capital letters;
  [\d]: for digitals;
  | : means or ;

  [\d\.]+: for float digitals;

 2. About Web Scraping:
    (1) open URL –> urllib.urlopen(url);
    (2) get source code of URL –> sock.read();
    (3) localise what you want to scrape through regular expression
             –> matcher=re.compile(RE)
                   elements=mather.search(htmlSource) or findall(htmlSource)
                   [search: returns one element;
                     findall: returns a list of results;]
   (4) read results 
           –> for x, y in enumerate(elements):

                 [x -> index of the list; y -> the value]

 3. The whole program:


Actions

Information

2 responses

5 01 2006
FlyFox

呵呵,好熟悉的
MS正则表达�,这个玩�俺以�还研究过一下下
�过现在往差�多啦
有时间交�哦。
�师�致敬:)

9 08 2006
alieksjii

Here are some links that I believe will be interested

Leave a comment