Data Scraping

You will need to write code to scrape this data.

The following data is to be scraped:

  1. All the agencies collected in an xls file and
  2. All the agents collected in a separate xls file.

Scraping the data for the Agencies.

To scrape the data start from this page https://gtgo.me/reiqbb

Search for agencies in each postcode area.

Here is an xls file of the postcodes to search through post-codes-only-qld.xls

  • There are 436 postcodes to search through to collect the agencies.
  • Some of the postcodes do not have any agencies.
  • Most of the postcodes have less than 30 agencies.
  • Some postcode areas have more than 30 agencies, you will need to (rotate through the alphabet) search for Agency Name Starts With:  a,b,c,d etc. to z, for that particular postcode.
  • In the screenshot below, you can see for the 4217 postcode area the message “Maximum of 30 results displayed. Please refine your search using the fields above.” appears.
    For postcode areas with more than 30 agencies you will need to rotate through the alphabet (Agents Name starts with a,b,c,d,e,f, etc) to collect all agencies.
  • See the screenshot below.


 


Clicking the links in the REIQ Accredited agency column opens the page where each Agency details are displayed this is where we collect the data for the Agencies.

On this page we click the “List of Agents” button to get the list of the Agents associated with each agency.

The data of the Agencies should be collected and saved in this manner (Agency, ABN Number [saved into the last field on each row], Address,Phone,Fax,Website,Email,Contact Name [Contact Name need to be split and saved Into into Contact First Name & Contact Last Name fields], Post Code [The Post Code from the address must be duplicated and saved into the Post Code field) Areas of Practice [All Areas of Practice items need to be saved into the Areas of Practice field in this format – each item separated by – e.g. Auctioneer – Property Management – Residential Sales]

All phone numbers, fax numbers and mobile numbers need to be formated in international format e.g. A +61 prefix added, the leading zero deleted and spaces stripped from the numbers e.g. 046 888 3313 becomes +61468883313 – Landline numbers and fax numbers are changed from e.g. 07 5538 0117 to +61755380117

PLEASE NOTE: In my sample files I have not formatted the phone numbers.
The Data of the Agencies needs to be saved into one spreadsheet. Download a sample file here my-sample-agencies-nu.xls

 

 


Scraping the data for the Agents.

Clicking the link from the list of Agencies takes to you to the page where the Agency details are collected: clicking the link “List of Agents” will take you to the pages where the Agents data is collected.

We need to collect all the agents for every agency. The Data of the Agents need to be saved into a separate xls spreadsheet. Download a sample file here my-sample-agents-NU.xls

Below is a screenshot of one of the larger Agencies with 70 agents.


De-duplication

There are some dupicates in the lists of agents. the Agents data needs to be de-duplicated. Some of the duplicated agents have email but no Mobile. we need to keep the duplicates with both the Email and the Moible.


When you have the scraper code written supply 4 xls files to me and I will check the data – once I confirm the data scrape is correct run the code and collect all the data for me:

  1. Two xls files of the Agencies from postcodes 4217 and 4350. AND
  2. Two xls files of all the Agents from postcodes 4217 and 4350

Please test the emails and mark the records with incorrect emails.

Looking forward to hearing from you.