This talk will first elaborate on the concept of Page objects based on Martin Fowler's idea that was initially introduced for automating the testing of web pages. Then the new idea developed by Zyte to use Page objects for web scraping will be introduced and the motivations behind this idea will be discussed which are namely:
- Pluggable: You create a simple generic scraper and just plug the Page object in and it works
- Portable: Page objects must be easily transferable as a Python package and adopted by any Scrapy project or other Python project
- Reusable: The same Page Object could be used by many different projects. It should be easily adopted by any existing web scraping project
Then, the talk will introduce the open-source Python package web-poet developed by Zyte for using page objects for web data extraction. The idea of page objects will be elaborated with code snippets and various features of the package will be discussed as well as the APIs that it offers for developers
Finally, the talk will conclude with some examples of using the web-poet package with Scrapy, the most popular Python framework for web scraping. This section will introduce Zyte's open-source Python package scrapy-poet for using the page objects technique with the Scrapy framework specifically
Zoom Link -
https://zyte.zoom.us/j/85161132569?pwd=bd5xbyVf3tXnP81wbpH3nz1WU1j5E4.1