Tidied up project structure, improved table links parsing

This commit is contained in:
Leonardo Cavaletti 2020-05-23 17:32:53 +01:00
parent 4f507e4485
commit bd76bc3089
4 changed files with 202 additions and 146 deletions

View File

@ -1,27 +1,32 @@
# Loconotion
**Loconotion** is a Python script that parses a [Notion.so](https://notion.so) public page (alongside with all of its subpages) and generates a static site out of it.
## But Why?
[Notion](https://notion.so) is a web app where you can create your own workspace / perosnal wiki out of content blocks. It feels good to use, and the results look very pretty - the developers did a great job. Given that it also offers the possibility of making a page (and its sub-page) public on the web, several people choose to use Notion to manage their personal blog, portfolio, or some kind of simple website. Sadly Notion does not support custom domains: your public pages are stuck in the `notion.so` domain, under long computer generated URLs.
Some services like Super, HostingPotion, HostNotion and Fruition try to work around this issue by relying on a [clever hack](https://gist.github.com/mayneyao/b9fefc9625b76f70488e5d8c2a99315d) using CloudFlare workers. This solution, however, has some disadvantages:
- **Not free** - Super, HostingPotion and HostNotion all take a monthly fee since they manage all the "hacky bits" for you; Fruition is open-source but any domain with a decent amount of daily visit will soon clash against CloudFlare's free tier limitations, and force you to upgrade to the 5$ or more plan (plus you need to setup Cloudflare yourself)
- **Not free** - Super, HostingPotion and HostNotion all take a monthly fee since they manage all the "hacky bits" for you; Fruition is open-source but any domain with a decent amount of daily visit will soon clash against CloudFlare's free tier limitations, and force you to upgrade to the 5\$ or more plan (plus you need to setup Cloudflare yourself)
- **Slow-ish** - As the page is still hosted on Notion, it comes bundled with all their analytics, editing / collaboration javascript, vendors css, and more bloat which causes the page to load at speeds that are not exactly appropriate for a simple blog / website. Running [this](https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e) example page on Google's [PageSpeed Insights](https://developers.google.com/speed/pagespeed/insights/) scores a measly **24 - 66** on mobile / desktop.
- **Ugly URLs** - While the services above enable the use of custom domains, the URLs for individual pages are stuck with the long, ugly, original Notion URL (apart from Fruition - they got custom URLs figured out, altough you will always see the original URL flashing for an instant when the page is loaded).
- **Notion Free Account Limitations** - Recently Notion introduced a change to its pricing model where public pages can't be set to be indexed by search engines on a free account (but they also removed the blocks count limitations, which is a good trade-off if you ask me)
Loconotion approaches this a bit differently. It lets Notion render the page, then scrapes it and saves a static version of the page to disk. This offers the following benefits:
- Strips out all the unnecessary bloat, like Notion's analytics, vendors scripts / styles, and javascript left in to enable collaboration.
- Caches all images / assets / fonts (hashing filenames), while keeping links intact.
- Cleans up the pages urls, letting you use custom slugs if desired
- Full meta tags controls, for the whole site or individual pages
- Granular custom Goggle Fonts control on headings, navbar, body and code blocks
- Granular custom Goggle Fonts control on headings, navbar, body and code blocks
- Lets you inject any custom style or script, from custom analytics or real-time chat support to hidden crypto miners (please don't do that)
- Outputs static files ready to be deployed on Netlify, GitHub Pages, Vercel, your Raspberry PI, that cheap second-hand Thinkpad you're using as a random server - you name it.
The result? A faster, self-contained version of the page that keeps all of Notion's nice layouts and eye candies. For comparison, the same example page parsed with Loconotion and deployed on Netflify's free tier achieves a PageSpeed Insight score of **96 - 100**!
Bear in mind that as we are effectively parsing a static version of the page, there are some limitations compared to Notion's live public pages:
- All pages will open in their own page and not modals (depending on how you look at it this could be a plus)
- Databases will be presented in their initial view - for example, no switching views from table to gallery and such
- All editing features will be disabled - no ticking checkboxes or dragging kanban boards cards around. Usually not an issue since a public page to serve as a website would have changes locked.
@ -30,25 +35,30 @@ Bear in mind that as we are effectively parsing a static version of the page, th
Everything else should be fine. Loconotion rebuilds the logic for toggle boxes and embeds so they still work; plus it defines some additional CSS rules to enable mobile responsiveness across the whole site (in some cases looking even better than Notion's defaults - wasn't exactly thought for mobile).
### But Notion already had an html export function?
It does, but I wasn't really happy with the styling - the pages looked a bit uglier than what they look like on a live Notion page. Plus, it doesn't support all the cool customization features outlined above!
## Installation & Requirements
`pip install -r requirements.txt`
This script uses [ChromeDriver](chromedriver.chromium.org) to automate the Google Chrome browser - therefore Google Chrome needs to be installed in order to work.
The script comes bundled with the default windows chromedriver executable. On Max / Linux, download the right distribution for you from https://chromedriver.chromium.org/downloads and place the executable in this folder. Alternatively, use the `--chromedriver` argument to specify its path at runtime
The script comes bundled with the default windows chromedriver executable. On Max / Linux, download the right distribution for you from https://chromedriver.chromium.org/downloads and place the executable in this folder. Alternatively, use the `--chromedriver` argument to specify its path at runtime.
## Simple Usage
`python loconotion.py https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e`
`python loconotion https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e`
In its simplest form, the script takes the URL of a public Notion.so page, and generates the site inside the `dist` folder, based on the page's title (the above example will generate the site inside `dist\The-perfect-It-s-Always-Sunny-in-Philadelphia\`).
## Advanced Usage
You can fully configure Loconotion to your needs by passing a [.toml](https://github.com/toml-lang/toml) configuration file to the script instead:
`python loconotion.py example\example_site.toml`
`python loconotion example\example_site.toml`
Here's what a full configuration would look like, alongside with explanations for each parameter.
```toml
## Loconotion Site Configuration File ##
# full .toml configuration example file to showcase all of Loconotion's available settings
@ -102,18 +112,18 @@ page = "https://www.notion.so/A-Notion-Page-03c403f4fdc94cc1b315b9469a8950ef"
# followed by name of the tag to inject. Each key in the table maps to an atttribute in the tag
# the following injects <link href="favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/> in the <head>
[[site.inject.head.link]]
rel="icon"
rel="icon"
sizes="16x16"
type="image/png"
href="/example/favicon-16x16.png"
# the following injects <script src="custom-script.js" type="text/javascript"></script> in the <body>
[[site.inject.body.script]]
type="text/javascript"
src="/example/custom-script.js"
## Individual Page Settings ##
# the [pages] table defines override settings for individual pages, by defining a sub-table named after the page url
# the [pages] table defines override settings for individual pages, by defining a sub-table named after the page url
# (or part of the url, but careful into not use a string that appears in multiple page urls)
[pages]
# the following settings will only apply to this page: https://www.notion.so/d2fa06f244e64f66880bb0491f58223d
@ -124,13 +134,13 @@ page = "https://www.notion.so/A-Notion-Page-03c403f4fdc94cc1b315b9469a8950ef"
slug = "list"
# change the description meta tag for this page only
[[pages.d2fa06f244e64f66880bb0491f58223d.meta]]
[[pages.d2fa06f244e64f66880bb0491f58223d.meta]]
name = "description"
content = "A fullscreen list database page, now with a pretty slug"
# change the title font for this page only
[pages.d2fa06f244e64f66880bb0491f58223d.fonts]
title = 'Nunito'
title = 'Nunito'
# for smaller sets of settings you can use inline notation
# 2483a3a5c3fd445980c1adc8e550b552.slug = "gallery"
@ -139,6 +149,7 @@ page = "https://www.notion.so/A-Notion-Page-03c403f4fdc94cc1b315b9469a8950ef"
```
On top of this, the script can take this optional arguments:
```
--clean Delete all previously cached files for the site before generating it
-v, --verbose Shows way more exciting facts in the output
@ -146,16 +157,19 @@ On top of this, the script can take this optional arguments:
```
## Roadmap / Features wishlist
- [ ] Dark / custom themes
- [ ] Dark / light theme toggle
- [ ] Automated Netlify / GitHub pages / Vercel deployements
- [ ] Injectable custom HTML
- [ ] Html / css / js minification & images optimization
- [ ] GUI / sites manager, potentially as a paid add-on to rack up some $ - daddy needs money to get more second-hand thinkpads to use as random servers
- [ ] Custom theming
## Who uses this?
If you used Loconotion to build a cool site, shoot me a mail! I'd love to feature it in some sort of showcase.
## Sites built with Loconotion
- [leonclvt.com](https://leoncvlt.com)
If you used Loconotion to build a cool site and want it added to the list above, shoot me a mail!
## Support
If you found this useful, and / or it saved you some money, consider using part of that saved money to buy me a coffee and mantain the balance in the universe.
<a href="https://www.buymeacoffee.com/leoncvlt" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/default-blue.png" alt="Buy Me A Coffee" style="height: 51px !important;width: 217px !important;" ></a>
![](https://img.shields.io/badge/-buy%20me%20a%20coffee-lightgrey?style=flat&logo=buy-me-a-coffee&color=FF813F&logoColor=white) If you found this useful, consider buying me a coffee so I get a a nice dose of methilxanthine, and you get a nice dose of good karma.

102
loconotion/__main__.py Normal file
View File

@ -0,0 +1,102 @@
import os
import sys
import logging
import urllib.parse
import argparse
from pathlib import Path
log = logging.getLogger("loconotion")
try:
import requests
import toml
except ModuleNotFoundError as error:
log.critical(f"ModuleNotFoundError: {error}. have your installed the requirements?")
sys.exit()
from notionparser import Parser
def main():
# set up argument parser
argparser = argparse.ArgumentParser(description='Generate static websites from Notion.so pages')
argparser.add_argument('target', help='The config file containing the site properties, or the url of the Notion.so page to generate the site from')
argparser.add_argument('--chromedriver', help='Use a specific chromedriver executable instead of the auto-installing one')
argparser.add_argument("--single-page", action="store_true", help="Only parse the first page, then stop")
argparser.add_argument('--clean', action='store_true', help='Delete all previously cached files for the site before generating it')
argparser.add_argument('--non-headless', action='store_true', help='Run chromedriver in non-headless mode')
argparser.add_argument("-v", "--verbose", action="store_true", help="Increasite output log verbosity")
args = argparser.parse_args()
# set up some pretty logs
log = logging.getLogger("loconotion")
log.setLevel(logging.INFO if not args.verbose else logging.DEBUG)
log_screen_handler = logging.StreamHandler(stream=sys.stdout)
log.addHandler(log_screen_handler)
log.propagate = False
try:
import colorama, copy
LOG_COLORS = {
logging.DEBUG: colorama.Fore.GREEN,
logging.INFO: colorama.Fore.BLUE,
logging.WARNING: colorama.Fore.YELLOW,
logging.ERROR: colorama.Fore.RED,
logging.CRITICAL: colorama.Back.RED
}
class ColorFormatter(logging.Formatter):
def format(self, record, *args, **kwargs):
# if the corresponding logger has children, they may receive modified
# record, so we want to keep it intact
new_record = copy.copy(record)
if new_record.levelno in LOG_COLORS:
new_record.levelname = "{color_begin}{level}{color_end}".format(
level=new_record.levelname,
color_begin=LOG_COLORS[new_record.levelno],
color_end=colorama.Style.RESET_ALL,
)
return super(ColorFormatter, self).format(new_record, *args, **kwargs)
log_screen_handler.setFormatter(ColorFormatter(fmt='%(asctime)s %(levelname)-8s %(message)s',
datefmt="{color_begin}[%H:%M:%S]{color_end}".format(
color_begin=colorama.Style.DIM,
color_end=colorama.Style.RESET_ALL
)))
except ModuleNotFoundError as identifier:
pass
# initialise and run the website parser
try:
if urllib.parse.urlparse(args.target).scheme:
try:
response = requests.get(args.target)
if ("notion.so" in args.target):
log.info("Initialising parser with simple page url")
config = { "page" : args.target }
Parser(config = config, args = vars(args))
else:
log.critical(f"{args.target} is not a notion.so page")
except requests.ConnectionError as exception:
log.critical(f"Connection error")
else:
if Path(args.target).is_file():
with open(args.target) as f:
parsed_config = toml.loads(f.read())
log.info(f"Initialising parser with configuration file")
log.debug(parsed_config)
Parser(config = parsed_config, args = vars(args))
else:
log.critical(f"Config file {args.target} does not exists")
except FileNotFoundError as e:
log.critical(f'FileNotFoundError: {e}')
sys.exit(0)
if __name__ == '__main__':
try:
main()
except KeyboardInterrupt:
log.critical('Interrupted by user')
try:
sys.exit(0)
except SystemExit:
os._exit(0)

41
loconotion/conditions.py Normal file
View File

@ -0,0 +1,41 @@
import logging
log = logging.getLogger(f"loconotion.{__name__}")
class notion_page_loaded(object):
"""An expectation for checking that a notion page has loaded.
"""
def __init__(self, url):
self.url = url
def __call__(self, driver):
notion_presence = len(driver.find_elements_by_class_name("notion-presence-container"))
collection_view_block = len(driver.find_elements_by_class_name("notion-collection_view_page-block"));
collection_search = len(driver.find_elements_by_class_name("collectionSearch"));
# embed_ghosts = len(driver.find_elements_by_css_selector("div[embed-ghost]"));
log.debug(f"Waiting for page content to load (presence container: {notion_presence}, loaders: {loading_spinners} )")
if (notion_presence and not loading_spinners):
return True
else:
return False
class toggle_block_has_opened(object):
"""An expectation for checking that a notion toggle block has been opened.
It does so by checking if the div hosting the content has enough children,
and the abscence of the loading spinner.
"""
def __init__(self, toggle_block):
self.toggle_block = toggle_block
def __call__(self, driver):
toggle_content = self.toggle_block.find_element_by_css_selector("div:not([style]")
if (toggle_content):
content_children = len(toggle_content.find_elements_by_tag_name("div"))
is_loading = len(self.toggle_block.find_elements_by_class_name("loading-spinner"));
log.debug(f"Waiting for toggle block to load ({content_children} children so far and {is_loading} loaders)")
if (content_children > 3 and not is_loading):
return True
else:
return False
else:
return False

View File

@ -1,5 +1,4 @@
import os
import platform
import sys
import shutil
import time
@ -10,10 +9,9 @@ import glob
import mimetypes
import urllib.parse
import hashlib
import argparse
from pathlib import Path
log = logging.getLogger("loconotion")
log = logging.getLogger(f"loconotion.{__name__}")
try:
import chromedriver_autoinstaller
@ -26,50 +24,13 @@ try:
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
import requests
import toml
import cssutils
cssutils.log.setLevel(logging.CRITICAL) # removes warning logs from cssutils
except ModuleNotFoundError as error:
log.critical(f"ModuleNotFoundError: {error}. have your installed the requirements?")
sys.exit()
class notion_page_loaded(object):
"""An expectation for checking that a notion page has loaded.
"""
def __init__(self, url):
self.url = url
def __call__(self, driver):
notion_presence = len(driver.find_elements_by_class_name("notion-presence-container"))
collection_view_block = len(driver.find_elements_by_class_name("notion-collection_view_page-block"));
collection_search = len(driver.find_elements_by_class_name("collectionSearch"));
# embed_ghosts = len(driver.find_elements_by_css_selector("div[embed-ghost]"));
log.debug(f"Waiting for page content to load (presence container: {notion_presence}, loaders: {loading_spinners} )")
if (notion_presence and not loading_spinners):
return True
else:
return False
class toggle_block_has_opened(object):
"""An expectation for checking that a notion toggle block has been opened.
It does so by checking if the div hosting the content has enough children,
and the abscence of the loading spinner.
"""
def __init__(self, toggle_block):
self.toggle_block = toggle_block
def __call__(self, driver):
toggle_content = self.toggle_block.find_element_by_css_selector("div:not([style]")
if (toggle_content):
content_children = len(toggle_content.find_elements_by_tag_name("div"))
is_loading = len(self.toggle_block.find_elements_by_class_name("loading-spinner"));
log.debug(f"Waiting for toggle block to load ({content_children} children so far and {is_loading} loaders)")
if (content_children > 3 and not is_loading):
return True
else:
return False
else:
return False
from conditions import toggle_block_has_opened
class Parser():
def __init__(self, config = {}, args = {}):
@ -190,7 +151,7 @@ class Parser():
if (not file_extension):
content_type = response.headers.get('content-type')
if (content_type):
file_extension = mimetypes.guess_extension(content_types)
file_extension = mimetypes.guess_extension(content_type)
destination = destination.with_suffix(file_extension)
Path(destination).parent.mkdir(parents=True, exist_ok=True)
@ -229,9 +190,10 @@ class Parser():
logs_path = (Path.cwd() / "logs" / "webdrive.log")
logs_path.parent.mkdir(parents=True, exist_ok=True)
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("window-size=1920,1080")
chrome_options = Options()
if (not self.args.get("non_headless", False)):
chrome_options.add_argument("--headless")
chrome_options.add_argument("window-size=1920,1080")
chrome_options.add_argument("--log-level=3");
chrome_options.add_argument("--silent");
chrome_options.add_argument("--disable-logging")
@ -293,13 +255,13 @@ class Parser():
if (len(new_toggle_blocks) > len(toggle_blocks)):
# if so, run the function again
open_toggle_blocks(opened_toggles)
# open those toggle blocks!
# open the toggle blocks in the page
open_toggle_blocks()
# creates soup from the page to start parsing
soup = BeautifulSoup(self.driver.page_source, "html.parser")
# remove scripts and other tags we don't want / need
for unwanted in soup.findAll('script'):
unwanted.decompose();
@ -312,6 +274,7 @@ class Parser():
for vendors_css in soup.find_all("link", href=lambda x: x and 'vendors~' in x):
vendors_css.decompose();
# clean up the default notion meta tags
for tag in ["description", "twitter:card", "twitter:site", "twitter:title", "twitter:description", "twitter:image", "twitter:url", "apple-itunes-app"]:
unwanted_tag = soup.find("meta", attrs = { "name" : tag})
@ -320,6 +283,7 @@ class Parser():
unwanted_og_tag = soup.find("meta", attrs = { "property" : tag})
if (unwanted_og_tag): unwanted_og_tag.decompose();
# set custom meta tags
custom_meta_tags = self.get_page_config(url).get("meta", [])
for custom_meta_tag in custom_meta_tags:
@ -329,6 +293,7 @@ class Parser():
log.debug(f"Adding meta tag {str(tag)}")
soup.head.append(tag)
# process images
cache_images = True
for img in soup.findAll('img'):
@ -349,6 +314,7 @@ class Parser():
if (img['src'].startswith('/')):
img['src'] = "https://www.notion.so" + img['src']
# process stylesheets
for link in soup.findAll('link', rel="stylesheet"):
if link.has_attr('href') and link['href'].startswith('/'):
@ -368,6 +334,7 @@ class Parser():
rule.style['src'] = f"url({str(cached_font_file)})"
link['href'] = str(cached_css_file)
# add our custom logic to all toggle blocks
for toggle_block in soup.findAll('div',{'class':'notion-toggle-block'}):
toggle_id = uuid.uuid4()
@ -380,6 +347,7 @@ class Parser():
toggle_content['class'] = toggle_content.get('class', []) + ['loconotion-toggle-content']
toggle_content.attrs['loconotion-toggle-id'] = toggle_button.attrs['loconotion-toggle-id'] = toggle_id
# if there are any table views in the page, add links to the title rows
for table_view in soup.findAll('div', {'class':'notion-table-view'}):
for table_row in table_view.findAll('div', {'class':'notion-collection-item'}):
@ -387,12 +355,16 @@ class Parser():
# then grab its href and wrap the table row's name into a link
table_row_block_id = table_row['data-block-id']
table_row_hover_target = self.driver.find_element_by_css_selector(f"div[data-block-id='{table_row_block_id}'] > div > div")
# need to scroll the row into view or else the open button won't visible to selenium
self.driver.execute_script("arguments[0].scrollIntoView();", table_row_hover_target)
ActionChains(self.driver).move_to_element(table_row_hover_target).perform()
try:
WebDriverWait(self.driver, 3).until(EC.presence_of_element_located((By.CSS_SELECTOR, f"div[data-block-id='{table_row_block_id}'] > div > a")))
WebDriverWait(self.driver, 5).until(EC.visibility_of_element_located(
(By.CSS_SELECTOR, f"div[data-block-id='{table_row_block_id}'] > div > a")))
except TimeoutException as ex:
log.error("Timeout")
log.error(f"Timeout waiting for the 'open' button for row in table with block id {table_row_block_id}")
table_row_href = self.driver.find_element_by_css_selector(f"div[data-block-id='{table_row_block_id}'] > div > a").get_attribute('href')
table_row_href = table_row_href.split("notion.so")[-1]
row_target_span = table_row.find("span")
row_link_wrapper = soup.new_tag('a', attrs={'href': table_row_href, 'style':"cursor: pointer;"})
row_target_span.wrap(row_link_wrapper)
@ -435,6 +407,7 @@ class Parser():
# finally append the font overrides stylesheets to the page
soup.head.append(font_override_stylesheet)
# inject any custom elements to the page
custom_injects = self.get_page_config(url).get("inject", {})
def injects_custom_tags(section):
@ -456,6 +429,7 @@ class Parser():
injects_custom_tags("head")
injects_custom_tags("body")
# inject loconotion's custom stylesheet and script
loconotion_custom_css = self.cache_file(Path("bundles/loconotion.css"))
custom_css = soup.new_tag("link", rel="stylesheet", href=str(loconotion_custom_css))
@ -464,6 +438,7 @@ class Parser():
custom_script = soup.new_tag("script", type="text/javascript", src=str(loconotion_custom_js))
soup.body.insert(-1, custom_script)
# find sub-pages and clean slugs / links
sub_pages = [];
for a in soup.findAll('a'):
@ -483,6 +458,7 @@ class Parser():
sub_pages.append(sub_page_href)
log.debug(f"Found link to page {a['href']}")
# exports the parsed page
html_str = str(soup)
html_file = self.get_page_slug(url) if url != index else "index.html"
@ -494,99 +470,22 @@ class Parser():
f.write(html_str.encode('utf-8').strip())
processed_pages[url] = html_file
# parse sub-pages
if (sub_pages and not self.args.get("single_page", False)):
if (processed_pages): log.debug(f"Pages processed so far: {len(processed_pages)}")
for sub_page in sub_pages:
if not sub_page in processed_pages.keys():
self.parse_page(sub_page, processed_pages = processed_pages, index = index)
#we're all done!
return processed_pages
def run(self, url):
start_time = time.time()
total_processed_pages = self.parse_page(url)
elapsed_time = time.time() - start_time
formatted_time = '{:02d}:{:02d}:{:02d}'.format(int(elapsed_time // 3600), int(elapsed_time % 3600 // 60), int(elapsed_time % 60))
log.info(f'Finished!\n\n\tヽ( ・‿・)ノ Processed {len(total_processed_pages)} pages in {formatted_time}')
if __name__ == '__main__':
# set up argument parser
parser = argparse.ArgumentParser(description='Generate static websites from Notion.so pages')
parser.add_argument('target', help='The config file containing the site properties, or the url of the Notion.so page to generate the site from')
parser.add_argument('--chromedriver', help='Use a specific chromedriver executable instead of the auto-installing one')
parser.add_argument("--single-page", action="store_true", default=False, help="Only parse the first page, then stop")
parser.add_argument('--clean', action='store_true', default=False, help='Delete all previously cached files for the site before generating it')
parser.add_argument("-v", "--verbose", action="store_true", help="Shows way more exciting facts in the output")
args = parser.parse_args()
# set up some pretty logs
log = logging.getLogger("loconotion")
log.setLevel(logging.INFO if not args.verbose else logging.DEBUG)
log_screen_handler = logging.StreamHandler(stream=sys.stdout)
log.addHandler(log_screen_handler)
log.propagate = False
try:
import colorama, copy
LOG_COLORS = {
logging.DEBUG: colorama.Fore.GREEN,
logging.INFO: colorama.Fore.BLUE,
logging.WARNING: colorama.Fore.YELLOW,
logging.ERROR: colorama.Fore.RED,
logging.CRITICAL: colorama.Back.RED
}
class ColorFormatter(logging.Formatter):
def format(self, record, *args, **kwargs):
# if the corresponding logger has children, they may receive modified
# record, so we want to keep it intact
new_record = copy.copy(record)
if new_record.levelno in LOG_COLORS:
new_record.levelname = "{color_begin}{level}{color_end}".format(
level=new_record.levelname,
color_begin=LOG_COLORS[new_record.levelno],
color_end=colorama.Style.RESET_ALL,
)
return super(ColorFormatter, self).format(new_record, *args, **kwargs)
log_screen_handler.setFormatter(ColorFormatter(fmt='%(asctime)s %(levelname)-8s %(message)s',
datefmt="{color_begin}[%H:%M:%S]{color_end}".format(
color_begin=colorama.Style.DIM,
color_end=colorama.Style.RESET_ALL
)))
except ModuleNotFoundError as identifier:
pass
# parse the provided arguments
try:
if urllib.parse.urlparse(args.target).scheme:
try:
response = requests.get(args.target)
if ("notion.so" in args.target):
log.info("Initialising parser with simple page url")
config = { "page" : args.target }
Parser(config = config, args = vars(args))
else:
log.critical(f"{args.target} is not a notion.so page")
except requests.ConnectionError as exception:
log.critical(f"Connection error")
else:
if Path(args.target).is_file():
with open(args.target) as f:
parsed_config = toml.loads(f.read())
log.info(f"Initialising parser with configuration file")
log.debug(parsed_config)
Parser(config = parsed_config, args = vars(args))
else:
log.critical(f"Config file {args.target} does not exists")
except FileNotFoundError as e:
log.critical(f'FileNotFoundError: {e}')
sys.exit(0)
except KeyboardInterrupt:
log.critical('Interrupted by user')
try:
sys.exit(0)
except SystemExit:
os._exit(0)
log.info(f'Finished!\n\nProcessed {len(total_processed_pages)} pages in {formatted_time}')