Parsers package¶
Submodules¶
Parsers.NYTimes module¶
-
class
Parsers.NYTimes.NYTimes¶ Bases:
Parsers.Parsers-
static
get_article_publish_date(webpage)¶ Parses webpage to return the date the article was published
Parameters: webpage – Returns: Article publish date Return type: DateTime object
-
static
get_article_publisher(webpage, url)¶ Parses webpage and/or url to return the publisher of an article
Parameters: - webpage –
- url –
Returns: Article publisher, ex: “The New York Times”
Return type: str
-
static
get_article_section(webpage, url)¶ Parses webpage and/or url to return a list of sections/subsections that the article is in
Parameters: - webpage –
- url –
Returns: list of section names in order from most narrow to biggest section
Return type: list
-
static
get_article_sources(webpage)¶ Parses webpage to extract all sources from an article
Parameters: webpage – Returns: list of sources, typically URLs of the sources Return type: list
-
static
get_article_text(webpage)¶ Parses webpage to return the full plaintext of the article
Parameters: webpage – Returns: Plaintext of article Return type: str
-
static
get_article_title(webpage)¶ Parses webpage to return the title/headline of an article
Parameters: webpage – Returns: Article headline Return type: str
-
static
Parsers.TheGuardian module¶
-
class
Parsers.TheGuardian.TheGuardian¶ Bases:
Parsers.Parsers-
static
get_article_publish_date(webpage)¶ Parses webpage to return the date the article was published
Parameters: webpage – Returns: Article publish date Return type: DateTime object
-
static
get_article_section(webpage, url)¶ Parses webpage and/or url to return a list of sections/subsections that the article is in
Parameters: - webpage –
- url –
Returns: list of section names in order from most narrow to biggest section
Return type: list
-
static
get_article_sources(webpage)¶ Parses webpage to extract all sources from an article
Parameters: webpage – Returns: list of sources, typically URLs of the sources Return type: list
-
static
get_article_text(webpage)¶ Parses webpage to return the full plaintext of the article
Parameters: webpage – Returns: Plaintext of article Return type: str
-
static
Parsers.TheIndependent module¶
-
class
Parsers.TheIndependent.TheIndependent¶ Bases:
Parsers.ParsersParses webpage to return the author of the article
Parameters: webpage – Returns: Author of the article Return type: str
-
static
get_article_section(webpage, url)¶ Parses webpage and/or url to return a list of sections/subsections that the article is in
Parameters: - webpage –
- url –
Returns: list of section names in order from most narrow to biggest section
Return type: list
-
static
get_article_sources(webpage)¶ Parses webpage to extract all sources from an article
Parameters: webpage – Returns: list of sources, typically URLs of the sources Return type: list
-
static
get_article_subtitle(webpage)¶ Parses webpage to return the subtitle of an article
Parameters: webpage – Returns: Article subtitle Return type: str
-
static
get_article_text(webpage)¶ Parses webpage to return the full plaintext of the article
Parameters: webpage – Returns: Plaintext of article Return type: str
Parsers.WashingtonPost module¶
-
class
Parsers.WashingtonPost.WashingtonPost¶ Bases:
Parsers.ParsersParses webpage to return the author of the article
Parameters: webpage – Returns: Author of the article Return type: str
-
static
get_article_publish_date(webpage)¶ Parses webpage to return the date the article was published
Parameters: webpage – Returns: Article publish date Return type: DateTime object
-
static
get_article_publisher(webpage, url)¶ Parses webpage and/or url to return the publisher of an article
Parameters: - webpage –
- url –
Returns: Article publisher, ex: “The New York Times”
Return type: str
-
static
get_article_section(webpage, url)¶ Parses webpage and/or url to return a list of sections/subsections that the article is in
Parameters: - webpage –
- url –
Returns: list of section names in order from most narrow to biggest section
Return type: list
-
static
get_article_sources(webpage)¶ Parses webpage to extract all sources from an article
Parameters: webpage – Returns: list of sources, typically URLs of the sources Return type: list
-
static
get_article_text(webpage)¶ Parses webpage to return the full plaintext of the article
Parameters: webpage – Returns: Plaintext of article Return type: str
-
static
get_article_title(webpage)¶ Parses webpage to return the title/headline of an article
Parameters: webpage – Returns: Article headline Return type: str
Module contents¶
-
class
Parsers.Parsers¶ Bases:
objectParses webpage to return the author of the article
Parameters: webpage – Returns: Author of the article Return type: str
-
static
get_article_publish_date(webpage)¶ Parses webpage to return the date the article was published
Parameters: webpage – Returns: Article publish date Return type: DateTime object
-
static
get_article_publisher(webpage, url)¶ Parses webpage and/or url to return the publisher of an article
Parameters: - webpage –
- url –
Returns: Article publisher, ex: “The New York Times”
Return type: str
-
static
get_article_section(webpage, url)¶ Parses webpage and/or url to return a list of sections/subsections that the article is in
Parameters: - webpage –
- url –
Returns: list of section names in order from most narrow to biggest section
Return type: list
-
static
get_article_sources(webpage)¶ Parses webpage to extract all sources from an article
Parameters: webpage – Returns: list of sources, typically URLs of the sources Return type: list
-
static
get_article_subtitle(webpage)¶ Parses webpage to return the subtitle of an article
Parameters: webpage – Returns: Article subtitle Return type: str
-
static
get_article_text(webpage)¶ Parses webpage to return the full plaintext of the article
Parameters: webpage – Returns: Plaintext of article Return type: str
-
static
get_article_title(webpage)¶ Parses webpage to return the title/headline of an article
Parameters: webpage – Returns: Article headline Return type: str
-
url_recognized(url)¶ Checks if this parser can parse a given URL
Parameters: url – URL to check if this parser can recognize it Returns: True if this parser can parse the given URL Return type: Boolean