This is the first time I use python to scrape web information. I want to write some experiences for that.
1. About Regular Expression:
[^>]*: anything in which begins with font;
\(: use back dash \ following any symbol you want to match (i.e., here ( );
[^\n]*: means to change to a new line at the end of a line;
[\w][\w] or [\w]{1,2}: means to match one or two words;
[\w]* or [\w]+: means >=0 or >=1;
([A-Z][A-Z]?): one or two capital letters;
[\d]: for digitals;
| : means or ;
[\d\.]+: for float digitals;
2. About Web Scraping:
(1) open URL –> urllib.urlopen(url);
(2) get source code of URL –> sock.read();
(3) localise what you want to scrape through regular expression
–> matcher=re.compile(RE)
elements=mather.search(htmlSource) or findall(htmlSource)
[search: returns one element;
findall: returns a list of results;]
(4) read results
–> for x, y in enumerate(elements):
[x -> index of the list; y -> the value]