Parsers package¶
Submodules¶
Parsers.NYTimes module¶
-
class
Parsers.NYTimes.
NYTimes
¶ Bases:
Parsers.Parsers
-
static
get_article_publish_date
(webpage)¶ Parses webpage to return the date the article was published
Parameters: webpage – Returns: Article publish date Return type: DateTime object
-
static
get_article_publisher
(webpage, url)¶ Parses webpage and/or url to return the publisher of an article
Parameters: - webpage –
- url –
Returns: Article publisher, ex: “The New York Times”
Return type: str
-
static
get_article_section
(webpage, url)¶ Parses webpage and/or url to return a list of sections/subsections that the article is in
Parameters: - webpage –
- url –
Returns: list of section names in order from most narrow to biggest section
Return type: list
-
static
get_article_sources
(webpage)¶ Parses webpage to extract all sources from an article
Parameters: webpage – Returns: list of sources, typically URLs of the sources Return type: list
-
static
get_article_text
(webpage)¶ Parses webpage to return the full plaintext of the article
Parameters: webpage – Returns: Plaintext of article Return type: str
-
static
get_article_title
(webpage)¶ Parses webpage to return the title/headline of an article
Parameters: webpage – Returns: Article headline Return type: str
-
static
Parsers.TheGuardian module¶
-
class
Parsers.TheGuardian.
TheGuardian
¶ Bases:
Parsers.Parsers
-
static
get_article_publish_date
(webpage)¶ Parses webpage to return the date the article was published
Parameters: webpage – Returns: Article publish date Return type: DateTime object
-
static
get_article_section
(webpage, url)¶ Parses webpage and/or url to return a list of sections/subsections that the article is in
Parameters: - webpage –
- url –
Returns: list of section names in order from most narrow to biggest section
Return type: list
-
static
get_article_sources
(webpage)¶ Parses webpage to extract all sources from an article
Parameters: webpage – Returns: list of sources, typically URLs of the sources Return type: list
-
static
get_article_text
(webpage)¶ Parses webpage to return the full plaintext of the article
Parameters: webpage – Returns: Plaintext of article Return type: str
-
static
Parsers.TheIndependent module¶
-
class
Parsers.TheIndependent.
TheIndependent
¶ Bases:
Parsers.Parsers
Parses webpage to return the author of the article
Parameters: webpage – Returns: Author of the article Return type: str
-
static
get_article_section
(webpage, url)¶ Parses webpage and/or url to return a list of sections/subsections that the article is in
Parameters: - webpage –
- url –
Returns: list of section names in order from most narrow to biggest section
Return type: list
-
static
get_article_sources
(webpage)¶ Parses webpage to extract all sources from an article
Parameters: webpage – Returns: list of sources, typically URLs of the sources Return type: list
-
static
get_article_subtitle
(webpage)¶ Parses webpage to return the subtitle of an article
Parameters: webpage – Returns: Article subtitle Return type: str
-
static
get_article_text
(webpage)¶ Parses webpage to return the full plaintext of the article
Parameters: webpage – Returns: Plaintext of article Return type: str
Parsers.WashingtonPost module¶
-
class
Parsers.WashingtonPost.
WashingtonPost
¶ Bases:
Parsers.Parsers
Parses webpage to return the author of the article
Parameters: webpage – Returns: Author of the article Return type: str
-
static
get_article_publish_date
(webpage)¶ Parses webpage to return the date the article was published
Parameters: webpage – Returns: Article publish date Return type: DateTime object
-
static
get_article_publisher
(webpage, url)¶ Parses webpage and/or url to return the publisher of an article
Parameters: - webpage –
- url –
Returns: Article publisher, ex: “The New York Times”
Return type: str
-
static
get_article_section
(webpage, url)¶ Parses webpage and/or url to return a list of sections/subsections that the article is in
Parameters: - webpage –
- url –
Returns: list of section names in order from most narrow to biggest section
Return type: list
-
static
get_article_sources
(webpage)¶ Parses webpage to extract all sources from an article
Parameters: webpage – Returns: list of sources, typically URLs of the sources Return type: list
-
static
get_article_text
(webpage)¶ Parses webpage to return the full plaintext of the article
Parameters: webpage – Returns: Plaintext of article Return type: str
-
static
get_article_title
(webpage)¶ Parses webpage to return the title/headline of an article
Parameters: webpage – Returns: Article headline Return type: str
Module contents¶
-
class
Parsers.
Parsers
¶ Bases:
object
Parses webpage to return the author of the article
Parameters: webpage – Returns: Author of the article Return type: str
-
static
get_article_publish_date
(webpage)¶ Parses webpage to return the date the article was published
Parameters: webpage – Returns: Article publish date Return type: DateTime object
-
static
get_article_publisher
(webpage, url)¶ Parses webpage and/or url to return the publisher of an article
Parameters: - webpage –
- url –
Returns: Article publisher, ex: “The New York Times”
Return type: str
-
static
get_article_section
(webpage, url)¶ Parses webpage and/or url to return a list of sections/subsections that the article is in
Parameters: - webpage –
- url –
Returns: list of section names in order from most narrow to biggest section
Return type: list
-
static
get_article_sources
(webpage)¶ Parses webpage to extract all sources from an article
Parameters: webpage – Returns: list of sources, typically URLs of the sources Return type: list
-
static
get_article_subtitle
(webpage)¶ Parses webpage to return the subtitle of an article
Parameters: webpage – Returns: Article subtitle Return type: str
-
static
get_article_text
(webpage)¶ Parses webpage to return the full plaintext of the article
Parameters: webpage – Returns: Plaintext of article Return type: str
-
static
get_article_title
(webpage)¶ Parses webpage to return the title/headline of an article
Parameters: webpage – Returns: Article headline Return type: str
-
url_recognized
(url)¶ Checks if this parser can parse a given URL
Parameters: url – URL to check if this parser can recognize it Returns: True if this parser can parse the given URL Return type: Boolean