Tidied up project structure, improved table links parsing

This commit is contained in:
Leonardo Cavaletti 2020-05-23 17:32:53 +01:00
parent 4f507e4485
commit bd76bc3089
4 changed files with 202 additions and 146 deletions

View File

@ -1,16 +1,20 @@
# Loconotion # Loconotion
**Loconotion** is a Python script that parses a [Notion.so](https://notion.so) public page (alongside with all of its subpages) and generates a static site out of it. **Loconotion** is a Python script that parses a [Notion.so](https://notion.so) public page (alongside with all of its subpages) and generates a static site out of it.
## But Why? ## But Why?
[Notion](https://notion.so) is a web app where you can create your own workspace / perosnal wiki out of content blocks. It feels good to use, and the results look very pretty - the developers did a great job. Given that it also offers the possibility of making a page (and its sub-page) public on the web, several people choose to use Notion to manage their personal blog, portfolio, or some kind of simple website. Sadly Notion does not support custom domains: your public pages are stuck in the `notion.so` domain, under long computer generated URLs. [Notion](https://notion.so) is a web app where you can create your own workspace / perosnal wiki out of content blocks. It feels good to use, and the results look very pretty - the developers did a great job. Given that it also offers the possibility of making a page (and its sub-page) public on the web, several people choose to use Notion to manage their personal blog, portfolio, or some kind of simple website. Sadly Notion does not support custom domains: your public pages are stuck in the `notion.so` domain, under long computer generated URLs.
Some services like Super, HostingPotion, HostNotion and Fruition try to work around this issue by relying on a [clever hack](https://gist.github.com/mayneyao/b9fefc9625b76f70488e5d8c2a99315d) using CloudFlare workers. This solution, however, has some disadvantages: Some services like Super, HostingPotion, HostNotion and Fruition try to work around this issue by relying on a [clever hack](https://gist.github.com/mayneyao/b9fefc9625b76f70488e5d8c2a99315d) using CloudFlare workers. This solution, however, has some disadvantages:
- **Not free** - Super, HostingPotion and HostNotion all take a monthly fee since they manage all the "hacky bits" for you; Fruition is open-source but any domain with a decent amount of daily visit will soon clash against CloudFlare's free tier limitations, and force you to upgrade to the 5$ or more plan (plus you need to setup Cloudflare yourself)
- **Not free** - Super, HostingPotion and HostNotion all take a monthly fee since they manage all the "hacky bits" for you; Fruition is open-source but any domain with a decent amount of daily visit will soon clash against CloudFlare's free tier limitations, and force you to upgrade to the 5\$ or more plan (plus you need to setup Cloudflare yourself)
- **Slow-ish** - As the page is still hosted on Notion, it comes bundled with all their analytics, editing / collaboration javascript, vendors css, and more bloat which causes the page to load at speeds that are not exactly appropriate for a simple blog / website. Running [this](https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e) example page on Google's [PageSpeed Insights](https://developers.google.com/speed/pagespeed/insights/) scores a measly **24 - 66** on mobile / desktop. - **Slow-ish** - As the page is still hosted on Notion, it comes bundled with all their analytics, editing / collaboration javascript, vendors css, and more bloat which causes the page to load at speeds that are not exactly appropriate for a simple blog / website. Running [this](https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e) example page on Google's [PageSpeed Insights](https://developers.google.com/speed/pagespeed/insights/) scores a measly **24 - 66** on mobile / desktop.
- **Ugly URLs** - While the services above enable the use of custom domains, the URLs for individual pages are stuck with the long, ugly, original Notion URL (apart from Fruition - they got custom URLs figured out, altough you will always see the original URL flashing for an instant when the page is loaded). - **Ugly URLs** - While the services above enable the use of custom domains, the URLs for individual pages are stuck with the long, ugly, original Notion URL (apart from Fruition - they got custom URLs figured out, altough you will always see the original URL flashing for an instant when the page is loaded).
- **Notion Free Account Limitations** - Recently Notion introduced a change to its pricing model where public pages can't be set to be indexed by search engines on a free account (but they also removed the blocks count limitations, which is a good trade-off if you ask me) - **Notion Free Account Limitations** - Recently Notion introduced a change to its pricing model where public pages can't be set to be indexed by search engines on a free account (but they also removed the blocks count limitations, which is a good trade-off if you ask me)
Loconotion approaches this a bit differently. It lets Notion render the page, then scrapes it and saves a static version of the page to disk. This offers the following benefits: Loconotion approaches this a bit differently. It lets Notion render the page, then scrapes it and saves a static version of the page to disk. This offers the following benefits:
- Strips out all the unnecessary bloat, like Notion's analytics, vendors scripts / styles, and javascript left in to enable collaboration. - Strips out all the unnecessary bloat, like Notion's analytics, vendors scripts / styles, and javascript left in to enable collaboration.
- Caches all images / assets / fonts (hashing filenames), while keeping links intact. - Caches all images / assets / fonts (hashing filenames), while keeping links intact.
- Cleans up the pages urls, letting you use custom slugs if desired - Cleans up the pages urls, letting you use custom slugs if desired
@ -22,6 +26,7 @@ Loconotion approaches this a bit differently. It lets Notion render the page, th
The result? A faster, self-contained version of the page that keeps all of Notion's nice layouts and eye candies. For comparison, the same example page parsed with Loconotion and deployed on Netflify's free tier achieves a PageSpeed Insight score of **96 - 100**! The result? A faster, self-contained version of the page that keeps all of Notion's nice layouts and eye candies. For comparison, the same example page parsed with Loconotion and deployed on Netflify's free tier achieves a PageSpeed Insight score of **96 - 100**!
Bear in mind that as we are effectively parsing a static version of the page, there are some limitations compared to Notion's live public pages: Bear in mind that as we are effectively parsing a static version of the page, there are some limitations compared to Notion's live public pages:
- All pages will open in their own page and not modals (depending on how you look at it this could be a plus) - All pages will open in their own page and not modals (depending on how you look at it this could be a plus)
- Databases will be presented in their initial view - for example, no switching views from table to gallery and such - Databases will be presented in their initial view - for example, no switching views from table to gallery and such
- All editing features will be disabled - no ticking checkboxes or dragging kanban boards cards around. Usually not an issue since a public page to serve as a website would have changes locked. - All editing features will be disabled - no ticking checkboxes or dragging kanban boards cards around. Usually not an issue since a public page to serve as a website would have changes locked.
@ -30,25 +35,30 @@ Bear in mind that as we are effectively parsing a static version of the page, th
Everything else should be fine. Loconotion rebuilds the logic for toggle boxes and embeds so they still work; plus it defines some additional CSS rules to enable mobile responsiveness across the whole site (in some cases looking even better than Notion's defaults - wasn't exactly thought for mobile). Everything else should be fine. Loconotion rebuilds the logic for toggle boxes and embeds so they still work; plus it defines some additional CSS rules to enable mobile responsiveness across the whole site (in some cases looking even better than Notion's defaults - wasn't exactly thought for mobile).
### But Notion already had an html export function? ### But Notion already had an html export function?
It does, but I wasn't really happy with the styling - the pages looked a bit uglier than what they look like on a live Notion page. Plus, it doesn't support all the cool customization features outlined above! It does, but I wasn't really happy with the styling - the pages looked a bit uglier than what they look like on a live Notion page. Plus, it doesn't support all the cool customization features outlined above!
## Installation & Requirements ## Installation & Requirements
`pip install -r requirements.txt` `pip install -r requirements.txt`
This script uses [ChromeDriver](chromedriver.chromium.org) to automate the Google Chrome browser - therefore Google Chrome needs to be installed in order to work. This script uses [ChromeDriver](chromedriver.chromium.org) to automate the Google Chrome browser - therefore Google Chrome needs to be installed in order to work.
The script comes bundled with the default windows chromedriver executable. On Max / Linux, download the right distribution for you from https://chromedriver.chromium.org/downloads and place the executable in this folder. Alternatively, use the `--chromedriver` argument to specify its path at runtime The script comes bundled with the default windows chromedriver executable. On Max / Linux, download the right distribution for you from https://chromedriver.chromium.org/downloads and place the executable in this folder. Alternatively, use the `--chromedriver` argument to specify its path at runtime.
## Simple Usage ## Simple Usage
`python loconotion.py https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e`
`python loconotion https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e`
In its simplest form, the script takes the URL of a public Notion.so page, and generates the site inside the `dist` folder, based on the page's title (the above example will generate the site inside `dist\The-perfect-It-s-Always-Sunny-in-Philadelphia\`). In its simplest form, the script takes the URL of a public Notion.so page, and generates the site inside the `dist` folder, based on the page's title (the above example will generate the site inside `dist\The-perfect-It-s-Always-Sunny-in-Philadelphia\`).
## Advanced Usage ## Advanced Usage
You can fully configure Loconotion to your needs by passing a [.toml](https://github.com/toml-lang/toml) configuration file to the script instead: You can fully configure Loconotion to your needs by passing a [.toml](https://github.com/toml-lang/toml) configuration file to the script instead:
`python loconotion.py example\example_site.toml` `python loconotion example\example_site.toml`
Here's what a full configuration would look like, alongside with explanations for each parameter. Here's what a full configuration would look like, alongside with explanations for each parameter.
```toml ```toml
## Loconotion Site Configuration File ## ## Loconotion Site Configuration File ##
# full .toml configuration example file to showcase all of Loconotion's available settings # full .toml configuration example file to showcase all of Loconotion's available settings
@ -139,6 +149,7 @@ page = "https://www.notion.so/A-Notion-Page-03c403f4fdc94cc1b315b9469a8950ef"
``` ```
On top of this, the script can take this optional arguments: On top of this, the script can take this optional arguments:
``` ```
--clean Delete all previously cached files for the site before generating it --clean Delete all previously cached files for the site before generating it
-v, --verbose Shows way more exciting facts in the output -v, --verbose Shows way more exciting facts in the output
@ -146,16 +157,19 @@ On top of this, the script can take this optional arguments:
``` ```
## Roadmap / Features wishlist ## Roadmap / Features wishlist
- [ ] Dark / custom themes
- [ ] Dark / light theme toggle
- [ ] Automated Netlify / GitHub pages / Vercel deployements - [ ] Automated Netlify / GitHub pages / Vercel deployements
- [ ] Injectable custom HTML - [ ] Injectable custom HTML
- [ ] Html / css / js minification & images optimization - [ ] Html / css / js minification & images optimization
- [ ] GUI / sites manager, potentially as a paid add-on to rack up some $ - daddy needs money to get more second-hand thinkpads to use as random servers - [ ] Custom theming
## Who uses this? ## Sites built with Loconotion
If you used Loconotion to build a cool site, shoot me a mail! I'd love to feature it in some sort of showcase.
- [leonclvt.com](https://leoncvlt.com)
If you used Loconotion to build a cool site and want it added to the list above, shoot me a mail!
## Support ## Support
If you found this useful, and / or it saved you some money, consider using part of that saved money to buy me a coffee and mantain the balance in the universe.
<a href="https://www.buymeacoffee.com/leoncvlt" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/default-blue.png" alt="Buy Me A Coffee" style="height: 51px !important;width: 217px !important;" ></a> ![](https://img.shields.io/badge/-buy%20me%20a%20coffee-lightgrey?style=flat&logo=buy-me-a-coffee&color=FF813F&logoColor=white) If you found this useful, consider buying me a coffee so I get a a nice dose of methilxanthine, and you get a nice dose of good karma.

102
loconotion/__main__.py Normal file
View File

@ -0,0 +1,102 @@
import os
import sys
import logging
import urllib.parse
import argparse
from pathlib import Path
log = logging.getLogger("loconotion")
try:
import requests
import toml
except ModuleNotFoundError as error:
log.critical(f"ModuleNotFoundError: {error}. have your installed the requirements?")
sys.exit()
from notionparser import Parser
def main():
# set up argument parser
argparser = argparse.ArgumentParser(description='Generate static websites from Notion.so pages')
argparser.add_argument('target', help='The config file containing the site properties, or the url of the Notion.so page to generate the site from')
argparser.add_argument('--chromedriver', help='Use a specific chromedriver executable instead of the auto-installing one')
argparser.add_argument("--single-page", action="store_true", help="Only parse the first page, then stop")
argparser.add_argument('--clean', action='store_true', help='Delete all previously cached files for the site before generating it')
argparser.add_argument('--non-headless', action='store_true', help='Run chromedriver in non-headless mode')
argparser.add_argument("-v", "--verbose", action="store_true", help="Increasite output log verbosity")
args = argparser.parse_args()
# set up some pretty logs
log = logging.getLogger("loconotion")
log.setLevel(logging.INFO if not args.verbose else logging.DEBUG)
log_screen_handler = logging.StreamHandler(stream=sys.stdout)
log.addHandler(log_screen_handler)
log.propagate = False
try:
import colorama, copy
LOG_COLORS = {
logging.DEBUG: colorama.Fore.GREEN,
logging.INFO: colorama.Fore.BLUE,
logging.WARNING: colorama.Fore.YELLOW,
logging.ERROR: colorama.Fore.RED,
logging.CRITICAL: colorama.Back.RED
}
class ColorFormatter(logging.Formatter):
def format(self, record, *args, **kwargs):
# if the corresponding logger has children, they may receive modified
# record, so we want to keep it intact
new_record = copy.copy(record)
if new_record.levelno in LOG_COLORS:
new_record.levelname = "{color_begin}{level}{color_end}".format(
level=new_record.levelname,
color_begin=LOG_COLORS[new_record.levelno],
color_end=colorama.Style.RESET_ALL,
)
return super(ColorFormatter, self).format(new_record, *args, **kwargs)
log_screen_handler.setFormatter(ColorFormatter(fmt='%(asctime)s %(levelname)-8s %(message)s',
datefmt="{color_begin}[%H:%M:%S]{color_end}".format(
color_begin=colorama.Style.DIM,
color_end=colorama.Style.RESET_ALL
)))
except ModuleNotFoundError as identifier:
pass
# initialise and run the website parser
try:
if urllib.parse.urlparse(args.target).scheme:
try:
response = requests.get(args.target)
if ("notion.so" in args.target):
log.info("Initialising parser with simple page url")
config = { "page" : args.target }
Parser(config = config, args = vars(args))
else:
log.critical(f"{args.target} is not a notion.so page")
except requests.ConnectionError as exception:
log.critical(f"Connection error")
else:
if Path(args.target).is_file():
with open(args.target) as f:
parsed_config = toml.loads(f.read())
log.info(f"Initialising parser with configuration file")
log.debug(parsed_config)
Parser(config = parsed_config, args = vars(args))
else:
log.critical(f"Config file {args.target} does not exists")
except FileNotFoundError as e:
log.critical(f'FileNotFoundError: {e}')
sys.exit(0)
if __name__ == '__main__':
try:
main()
except KeyboardInterrupt:
log.critical('Interrupted by user')
try:
sys.exit(0)
except SystemExit:
os._exit(0)

41
loconotion/conditions.py Normal file
View File

@ -0,0 +1,41 @@
import logging
log = logging.getLogger(f"loconotion.{__name__}")
class notion_page_loaded(object):
"""An expectation for checking that a notion page has loaded.
"""
def __init__(self, url):
self.url = url
def __call__(self, driver):
notion_presence = len(driver.find_elements_by_class_name("notion-presence-container"))
collection_view_block = len(driver.find_elements_by_class_name("notion-collection_view_page-block"));
collection_search = len(driver.find_elements_by_class_name("collectionSearch"));
# embed_ghosts = len(driver.find_elements_by_css_selector("div[embed-ghost]"));
log.debug(f"Waiting for page content to load (presence container: {notion_presence}, loaders: {loading_spinners} )")
if (notion_presence and not loading_spinners):
return True
else:
return False
class toggle_block_has_opened(object):
"""An expectation for checking that a notion toggle block has been opened.
It does so by checking if the div hosting the content has enough children,
and the abscence of the loading spinner.
"""
def __init__(self, toggle_block):
self.toggle_block = toggle_block
def __call__(self, driver):
toggle_content = self.toggle_block.find_element_by_css_selector("div:not([style]")
if (toggle_content):
content_children = len(toggle_content.find_elements_by_tag_name("div"))
is_loading = len(self.toggle_block.find_elements_by_class_name("loading-spinner"));
log.debug(f"Waiting for toggle block to load ({content_children} children so far and {is_loading} loaders)")
if (content_children > 3 and not is_loading):
return True
else:
return False
else:
return False

View File

@ -1,5 +1,4 @@
import os import os
import platform
import sys import sys
import shutil import shutil
import time import time
@ -10,10 +9,9 @@ import glob
import mimetypes import mimetypes
import urllib.parse import urllib.parse
import hashlib import hashlib
import argparse
from pathlib import Path from pathlib import Path
log = logging.getLogger("loconotion") log = logging.getLogger(f"loconotion.{__name__}")
try: try:
import chromedriver_autoinstaller import chromedriver_autoinstaller
@ -26,50 +24,13 @@ try:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
import requests import requests
import toml
import cssutils import cssutils
cssutils.log.setLevel(logging.CRITICAL) # removes warning logs from cssutils cssutils.log.setLevel(logging.CRITICAL) # removes warning logs from cssutils
except ModuleNotFoundError as error: except ModuleNotFoundError as error:
log.critical(f"ModuleNotFoundError: {error}. have your installed the requirements?") log.critical(f"ModuleNotFoundError: {error}. have your installed the requirements?")
sys.exit() sys.exit()
class notion_page_loaded(object): from conditions import toggle_block_has_opened
"""An expectation for checking that a notion page has loaded.
"""
def __init__(self, url):
self.url = url
def __call__(self, driver):
notion_presence = len(driver.find_elements_by_class_name("notion-presence-container"))
collection_view_block = len(driver.find_elements_by_class_name("notion-collection_view_page-block"));
collection_search = len(driver.find_elements_by_class_name("collectionSearch"));
# embed_ghosts = len(driver.find_elements_by_css_selector("div[embed-ghost]"));
log.debug(f"Waiting for page content to load (presence container: {notion_presence}, loaders: {loading_spinners} )")
if (notion_presence and not loading_spinners):
return True
else:
return False
class toggle_block_has_opened(object):
"""An expectation for checking that a notion toggle block has been opened.
It does so by checking if the div hosting the content has enough children,
and the abscence of the loading spinner.
"""
def __init__(self, toggle_block):
self.toggle_block = toggle_block
def __call__(self, driver):
toggle_content = self.toggle_block.find_element_by_css_selector("div:not([style]")
if (toggle_content):
content_children = len(toggle_content.find_elements_by_tag_name("div"))
is_loading = len(self.toggle_block.find_elements_by_class_name("loading-spinner"));
log.debug(f"Waiting for toggle block to load ({content_children} children so far and {is_loading} loaders)")
if (content_children > 3 and not is_loading):
return True
else:
return False
else:
return False
class Parser(): class Parser():
def __init__(self, config = {}, args = {}): def __init__(self, config = {}, args = {}):
@ -190,7 +151,7 @@ class Parser():
if (not file_extension): if (not file_extension):
content_type = response.headers.get('content-type') content_type = response.headers.get('content-type')
if (content_type): if (content_type):
file_extension = mimetypes.guess_extension(content_types) file_extension = mimetypes.guess_extension(content_type)
destination = destination.with_suffix(file_extension) destination = destination.with_suffix(file_extension)
Path(destination).parent.mkdir(parents=True, exist_ok=True) Path(destination).parent.mkdir(parents=True, exist_ok=True)
@ -230,6 +191,7 @@ class Parser():
logs_path.parent.mkdir(parents=True, exist_ok=True) logs_path.parent.mkdir(parents=True, exist_ok=True)
chrome_options = Options() chrome_options = Options()
if (not self.args.get("non_headless", False)):
chrome_options.add_argument("--headless") chrome_options.add_argument("--headless")
chrome_options.add_argument("window-size=1920,1080") chrome_options.add_argument("window-size=1920,1080")
chrome_options.add_argument("--log-level=3"); chrome_options.add_argument("--log-level=3");
@ -293,13 +255,13 @@ class Parser():
if (len(new_toggle_blocks) > len(toggle_blocks)): if (len(new_toggle_blocks) > len(toggle_blocks)):
# if so, run the function again # if so, run the function again
open_toggle_blocks(opened_toggles) open_toggle_blocks(opened_toggles)
# open the toggle blocks in the page
# open those toggle blocks!
open_toggle_blocks() open_toggle_blocks()
# creates soup from the page to start parsing # creates soup from the page to start parsing
soup = BeautifulSoup(self.driver.page_source, "html.parser") soup = BeautifulSoup(self.driver.page_source, "html.parser")
# remove scripts and other tags we don't want / need # remove scripts and other tags we don't want / need
for unwanted in soup.findAll('script'): for unwanted in soup.findAll('script'):
unwanted.decompose(); unwanted.decompose();
@ -312,6 +274,7 @@ class Parser():
for vendors_css in soup.find_all("link", href=lambda x: x and 'vendors~' in x): for vendors_css in soup.find_all("link", href=lambda x: x and 'vendors~' in x):
vendors_css.decompose(); vendors_css.decompose();
# clean up the default notion meta tags # clean up the default notion meta tags
for tag in ["description", "twitter:card", "twitter:site", "twitter:title", "twitter:description", "twitter:image", "twitter:url", "apple-itunes-app"]: for tag in ["description", "twitter:card", "twitter:site", "twitter:title", "twitter:description", "twitter:image", "twitter:url", "apple-itunes-app"]:
unwanted_tag = soup.find("meta", attrs = { "name" : tag}) unwanted_tag = soup.find("meta", attrs = { "name" : tag})
@ -320,6 +283,7 @@ class Parser():
unwanted_og_tag = soup.find("meta", attrs = { "property" : tag}) unwanted_og_tag = soup.find("meta", attrs = { "property" : tag})
if (unwanted_og_tag): unwanted_og_tag.decompose(); if (unwanted_og_tag): unwanted_og_tag.decompose();
# set custom meta tags # set custom meta tags
custom_meta_tags = self.get_page_config(url).get("meta", []) custom_meta_tags = self.get_page_config(url).get("meta", [])
for custom_meta_tag in custom_meta_tags: for custom_meta_tag in custom_meta_tags:
@ -329,6 +293,7 @@ class Parser():
log.debug(f"Adding meta tag {str(tag)}") log.debug(f"Adding meta tag {str(tag)}")
soup.head.append(tag) soup.head.append(tag)
# process images # process images
cache_images = True cache_images = True
for img in soup.findAll('img'): for img in soup.findAll('img'):
@ -349,6 +314,7 @@ class Parser():
if (img['src'].startswith('/')): if (img['src'].startswith('/')):
img['src'] = "https://www.notion.so" + img['src'] img['src'] = "https://www.notion.so" + img['src']
# process stylesheets # process stylesheets
for link in soup.findAll('link', rel="stylesheet"): for link in soup.findAll('link', rel="stylesheet"):
if link.has_attr('href') and link['href'].startswith('/'): if link.has_attr('href') and link['href'].startswith('/'):
@ -368,6 +334,7 @@ class Parser():
rule.style['src'] = f"url({str(cached_font_file)})" rule.style['src'] = f"url({str(cached_font_file)})"
link['href'] = str(cached_css_file) link['href'] = str(cached_css_file)
# add our custom logic to all toggle blocks # add our custom logic to all toggle blocks
for toggle_block in soup.findAll('div',{'class':'notion-toggle-block'}): for toggle_block in soup.findAll('div',{'class':'notion-toggle-block'}):
toggle_id = uuid.uuid4() toggle_id = uuid.uuid4()
@ -380,6 +347,7 @@ class Parser():
toggle_content['class'] = toggle_content.get('class', []) + ['loconotion-toggle-content'] toggle_content['class'] = toggle_content.get('class', []) + ['loconotion-toggle-content']
toggle_content.attrs['loconotion-toggle-id'] = toggle_button.attrs['loconotion-toggle-id'] = toggle_id toggle_content.attrs['loconotion-toggle-id'] = toggle_button.attrs['loconotion-toggle-id'] = toggle_id
# if there are any table views in the page, add links to the title rows # if there are any table views in the page, add links to the title rows
for table_view in soup.findAll('div', {'class':'notion-table-view'}): for table_view in soup.findAll('div', {'class':'notion-table-view'}):
for table_row in table_view.findAll('div', {'class':'notion-collection-item'}): for table_row in table_view.findAll('div', {'class':'notion-collection-item'}):
@ -387,12 +355,16 @@ class Parser():
# then grab its href and wrap the table row's name into a link # then grab its href and wrap the table row's name into a link
table_row_block_id = table_row['data-block-id'] table_row_block_id = table_row['data-block-id']
table_row_hover_target = self.driver.find_element_by_css_selector(f"div[data-block-id='{table_row_block_id}'] > div > div") table_row_hover_target = self.driver.find_element_by_css_selector(f"div[data-block-id='{table_row_block_id}'] > div > div")
# need to scroll the row into view or else the open button won't visible to selenium
self.driver.execute_script("arguments[0].scrollIntoView();", table_row_hover_target)
ActionChains(self.driver).move_to_element(table_row_hover_target).perform() ActionChains(self.driver).move_to_element(table_row_hover_target).perform()
try: try:
WebDriverWait(self.driver, 3).until(EC.presence_of_element_located((By.CSS_SELECTOR, f"div[data-block-id='{table_row_block_id}'] > div > a"))) WebDriverWait(self.driver, 5).until(EC.visibility_of_element_located(
(By.CSS_SELECTOR, f"div[data-block-id='{table_row_block_id}'] > div > a")))
except TimeoutException as ex: except TimeoutException as ex:
log.error("Timeout") log.error(f"Timeout waiting for the 'open' button for row in table with block id {table_row_block_id}")
table_row_href = self.driver.find_element_by_css_selector(f"div[data-block-id='{table_row_block_id}'] > div > a").get_attribute('href') table_row_href = self.driver.find_element_by_css_selector(f"div[data-block-id='{table_row_block_id}'] > div > a").get_attribute('href')
table_row_href = table_row_href.split("notion.so")[-1]
row_target_span = table_row.find("span") row_target_span = table_row.find("span")
row_link_wrapper = soup.new_tag('a', attrs={'href': table_row_href, 'style':"cursor: pointer;"}) row_link_wrapper = soup.new_tag('a', attrs={'href': table_row_href, 'style':"cursor: pointer;"})
row_target_span.wrap(row_link_wrapper) row_target_span.wrap(row_link_wrapper)
@ -435,6 +407,7 @@ class Parser():
# finally append the font overrides stylesheets to the page # finally append the font overrides stylesheets to the page
soup.head.append(font_override_stylesheet) soup.head.append(font_override_stylesheet)
# inject any custom elements to the page # inject any custom elements to the page
custom_injects = self.get_page_config(url).get("inject", {}) custom_injects = self.get_page_config(url).get("inject", {})
def injects_custom_tags(section): def injects_custom_tags(section):
@ -456,6 +429,7 @@ class Parser():
injects_custom_tags("head") injects_custom_tags("head")
injects_custom_tags("body") injects_custom_tags("body")
# inject loconotion's custom stylesheet and script # inject loconotion's custom stylesheet and script
loconotion_custom_css = self.cache_file(Path("bundles/loconotion.css")) loconotion_custom_css = self.cache_file(Path("bundles/loconotion.css"))
custom_css = soup.new_tag("link", rel="stylesheet", href=str(loconotion_custom_css)) custom_css = soup.new_tag("link", rel="stylesheet", href=str(loconotion_custom_css))
@ -464,6 +438,7 @@ class Parser():
custom_script = soup.new_tag("script", type="text/javascript", src=str(loconotion_custom_js)) custom_script = soup.new_tag("script", type="text/javascript", src=str(loconotion_custom_js))
soup.body.insert(-1, custom_script) soup.body.insert(-1, custom_script)
# find sub-pages and clean slugs / links # find sub-pages and clean slugs / links
sub_pages = []; sub_pages = [];
for a in soup.findAll('a'): for a in soup.findAll('a'):
@ -483,6 +458,7 @@ class Parser():
sub_pages.append(sub_page_href) sub_pages.append(sub_page_href)
log.debug(f"Found link to page {a['href']}") log.debug(f"Found link to page {a['href']}")
# exports the parsed page # exports the parsed page
html_str = str(soup) html_str = str(soup)
html_file = self.get_page_slug(url) if url != index else "index.html" html_file = self.get_page_slug(url) if url != index else "index.html"
@ -494,6 +470,7 @@ class Parser():
f.write(html_str.encode('utf-8').strip()) f.write(html_str.encode('utf-8').strip())
processed_pages[url] = html_file processed_pages[url] = html_file
# parse sub-pages # parse sub-pages
if (sub_pages and not self.args.get("single_page", False)): if (sub_pages and not self.args.get("single_page", False)):
if (processed_pages): log.debug(f"Pages processed so far: {len(processed_pages)}") if (processed_pages): log.debug(f"Pages processed so far: {len(processed_pages)}")
@ -501,92 +478,14 @@ class Parser():
if not sub_page in processed_pages.keys(): if not sub_page in processed_pages.keys():
self.parse_page(sub_page, processed_pages = processed_pages, index = index) self.parse_page(sub_page, processed_pages = processed_pages, index = index)
#we're all done! #we're all done!
return processed_pages return processed_pages
def run(self, url): def run(self, url):
start_time = time.time() start_time = time.time()
total_processed_pages = self.parse_page(url) total_processed_pages = self.parse_page(url)
elapsed_time = time.time() - start_time elapsed_time = time.time() - start_time
formatted_time = '{:02d}:{:02d}:{:02d}'.format(int(elapsed_time // 3600), int(elapsed_time % 3600 // 60), int(elapsed_time % 60)) formatted_time = '{:02d}:{:02d}:{:02d}'.format(int(elapsed_time // 3600), int(elapsed_time % 3600 // 60), int(elapsed_time % 60))
log.info(f'Finished!\n\n\tヽ( ・‿・)ノ Processed {len(total_processed_pages)} pages in {formatted_time}') log.info(f'Finished!\n\nProcessed {len(total_processed_pages)} pages in {formatted_time}')
if __name__ == '__main__':
# set up argument parser
parser = argparse.ArgumentParser(description='Generate static websites from Notion.so pages')
parser.add_argument('target', help='The config file containing the site properties, or the url of the Notion.so page to generate the site from')
parser.add_argument('--chromedriver', help='Use a specific chromedriver executable instead of the auto-installing one')
parser.add_argument("--single-page", action="store_true", default=False, help="Only parse the first page, then stop")
parser.add_argument('--clean', action='store_true', default=False, help='Delete all previously cached files for the site before generating it')
parser.add_argument("-v", "--verbose", action="store_true", help="Shows way more exciting facts in the output")
args = parser.parse_args()
# set up some pretty logs
log = logging.getLogger("loconotion")
log.setLevel(logging.INFO if not args.verbose else logging.DEBUG)
log_screen_handler = logging.StreamHandler(stream=sys.stdout)
log.addHandler(log_screen_handler)
log.propagate = False
try:
import colorama, copy
LOG_COLORS = {
logging.DEBUG: colorama.Fore.GREEN,
logging.INFO: colorama.Fore.BLUE,
logging.WARNING: colorama.Fore.YELLOW,
logging.ERROR: colorama.Fore.RED,
logging.CRITICAL: colorama.Back.RED
}
class ColorFormatter(logging.Formatter):
def format(self, record, *args, **kwargs):
# if the corresponding logger has children, they may receive modified
# record, so we want to keep it intact
new_record = copy.copy(record)
if new_record.levelno in LOG_COLORS:
new_record.levelname = "{color_begin}{level}{color_end}".format(
level=new_record.levelname,
color_begin=LOG_COLORS[new_record.levelno],
color_end=colorama.Style.RESET_ALL,
)
return super(ColorFormatter, self).format(new_record, *args, **kwargs)
log_screen_handler.setFormatter(ColorFormatter(fmt='%(asctime)s %(levelname)-8s %(message)s',
datefmt="{color_begin}[%H:%M:%S]{color_end}".format(
color_begin=colorama.Style.DIM,
color_end=colorama.Style.RESET_ALL
)))
except ModuleNotFoundError as identifier:
pass
# parse the provided arguments
try:
if urllib.parse.urlparse(args.target).scheme:
try:
response = requests.get(args.target)
if ("notion.so" in args.target):
log.info("Initialising parser with simple page url")
config = { "page" : args.target }
Parser(config = config, args = vars(args))
else:
log.critical(f"{args.target} is not a notion.so page")
except requests.ConnectionError as exception:
log.critical(f"Connection error")
else:
if Path(args.target).is_file():
with open(args.target) as f:
parsed_config = toml.loads(f.read())
log.info(f"Initialising parser with configuration file")
log.debug(parsed_config)
Parser(config = parsed_config, args = vars(args))
else:
log.critical(f"Config file {args.target} does not exists")
except FileNotFoundError as e:
log.critical(f'FileNotFoundError: {e}')
sys.exit(0)
except KeyboardInterrupt:
log.critical('Interrupted by user')
try:
sys.exit(0)
except SystemExit:
os._exit(0)