Page Objects for Web Data Extraction | Umair Ahmed

Wednesday, April 3, 12:00pm - 1:15pm (EDT)

Time shown in-04:00 America, New York

This talk will first elaborate on the concept of Page objects based on Martin Fowler's idea that was initially introduced for automating the testing of web pages. Then the new idea developed by Zyte to use Page objects for web scraping will be introduced and the motivations behind this idea will be discussed which are namely:

Pluggable: You create a simple generic scraper and just plug the Page object in and it works
Portable: Page objects must be easily transferable as a Python package and adopted by any Scrapy project or other Python project
Reusable: The same Page Object could be used by many different projects. It should be easily adopted by any existing web scraping project

Then, the talk will introduce the open-source Python package web-poet developed by Zyte for using page objects for web data extraction. The idea of page objects will be elaborated with code snippets and various features of the package will be discussed as well as the APIs that it offers for developers

Finally, the talk will conclude with some examples of using the web-poet package with Scrapy, the most popular Python framework for web scraping. This section will introduce Zyte's open-source Python package scrapy-poet for using the page objects technique with the Scrapy framework specifically

Zoom Link - https://zyte.zoom.us/j/85161132569?pwd=bd5xbyVf3tXnP81wbpH3nz1WU1j5E4.1

Add to Calendar 2024/04/03 16:00:00 2024/04/03 17:15:00 UTC Page Objects for Web Data Extraction | Umair Ahmed This talk will first elaborate on the concept of Page objects based on Martin Fowler's idea that was initially introduced for automating the testing of web pages. Then the new idea developed by Zyte to use Page objects for web scraping will be introduced and the motivations behind this idea will be discussed which are namely:

Pluggable: You create a simple generic scraper and just plug the Page object in and it works
Portable: Page objects must be easily transferable as a Python package and adopted by any Scrapy project or other Python project
Reusable: The same Page Object could be used by many different projects. It should be easily adopted by any existing web scraping project

https://zyte.zoom.us/j/85161132569?pwd=bd5xbyVf3tXnP81wbpH3nz1WU1j5E4.1

Zyte, media@scrapinghub.com

Link: