mirror of
https://github.com/leoncvlt/loconotion.git
synced 2024-08-30 18:12:12 +00:00
Added anchor links processing and --chromedriver argument
This commit is contained in:
parent
5962d7232f
commit
c489e0a8c1
3
.gitignore
vendored
3
.gitignore
vendored
@ -113,6 +113,5 @@ dmypy.json
|
|||||||
env
|
env
|
||||||
dist/*
|
dist/*
|
||||||
test/*
|
test/*
|
||||||
debug.log
|
logs/*
|
||||||
webdrive.log
|
|
||||||
*.bat
|
*.bat
|
83
README.md
83
README.md
@ -8,6 +8,7 @@ Some services like Super, HostingPotion, HostNotion and Fruition try to work aro
|
|||||||
- **Not free** - Super, HostingPotion and HostNotion all take a monthly fee since they manage all the "hacky bits" for you; Fruition is open-source but any domain with a decent amount of daily visit will soon clash against CloudFlare's free tier limitations, and force you to upgrade to the 5$ or more plan (plus you need to setup Cloudflare yourself)
|
- **Not free** - Super, HostingPotion and HostNotion all take a monthly fee since they manage all the "hacky bits" for you; Fruition is open-source but any domain with a decent amount of daily visit will soon clash against CloudFlare's free tier limitations, and force you to upgrade to the 5$ or more plan (plus you need to setup Cloudflare yourself)
|
||||||
- **Slow-ish** - As the page is still hosted on Notion, it comes bundled with all their analytics, editing / collaboration javascript, vendors css, and more bloat which causes the page to load at speeds that are not exactly appropriate for a simple blog / website. Running [this](https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e) example page on Google's [PageSpeed Insights](https://developers.google.com/speed/pagespeed/insights/) scores a measly **24 - 66** on mobile / desktop.
|
- **Slow-ish** - As the page is still hosted on Notion, it comes bundled with all their analytics, editing / collaboration javascript, vendors css, and more bloat which causes the page to load at speeds that are not exactly appropriate for a simple blog / website. Running [this](https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e) example page on Google's [PageSpeed Insights](https://developers.google.com/speed/pagespeed/insights/) scores a measly **24 - 66** on mobile / desktop.
|
||||||
- **Ugly URLs** - While the services above enable the use of custom domains, the URLs for individual pages are stuck with the long, ugly, original Notion URL (apart from Fruition - they got custom URLs figured out, altough you will always see the original URL flashing for an instant when the page is loaded).
|
- **Ugly URLs** - While the services above enable the use of custom domains, the URLs for individual pages are stuck with the long, ugly, original Notion URL (apart from Fruition - they got custom URLs figured out, altough you will always see the original URL flashing for an instant when the page is loaded).
|
||||||
|
- **Notion Free Account Limitations** - Recently Notion introduced a change to its pricing model where public pages can't be set to be indexed by search engines on a free account (but they also removed the blocks count limitations, which is a good trade-off if you ask me)
|
||||||
|
|
||||||
Loconotion approaches this a bit differently. It lets Notion render the page, then scrapes it and saves a static version of the page to disk. This offers the following benefits:
|
Loconotion approaches this a bit differently. It lets Notion render the page, then scrapes it and saves a static version of the page to disk. This offers the following benefits:
|
||||||
- Strips out all the unnecessary bloat, like Notion's analytics, vendors scripts / styles, and javascript left in to enable collaboration.
|
- Strips out all the unnecessary bloat, like Notion's analytics, vendors scripts / styles, and javascript left in to enable collaboration.
|
||||||
@ -28,11 +29,16 @@ Bear in mind that as we are effectively parsing a static version of the page, th
|
|||||||
|
|
||||||
Everything else should be fine. Loconotion rebuilds the logic for toggle boxes and embeds so they still work; plus it defines some additional CSS rules to enable mobile responsiveness across the whole site (in some cases looking even better than Notion's defaults - wasn't exactly thought for mobile).
|
Everything else should be fine. Loconotion rebuilds the logic for toggle boxes and embeds so they still work; plus it defines some additional CSS rules to enable mobile responsiveness across the whole site (in some cases looking even better than Notion's defaults - wasn't exactly thought for mobile).
|
||||||
|
|
||||||
|
### But Notion already had an html export function?
|
||||||
|
It does, but I wasn't really happy with the styling - the pages looked a bit uglier than what they look like on a live Notion page. Plus, it doesn't support all the cool customization features outlined above!
|
||||||
|
|
||||||
## Installation & Requirements
|
## Installation & Requirements
|
||||||
`pip install -r requirements.txt`
|
`pip install -r requirements.txt`
|
||||||
|
|
||||||
This script uses [ChromeDriver](chromedriver.chromium.org) to automate the Google Chrome browser - therefore Google Chrome needs to be installed in order to work.
|
This script uses [ChromeDriver](chromedriver.chromium.org) to automate the Google Chrome browser - therefore Google Chrome needs to be installed in order to work.
|
||||||
|
|
||||||
|
The script comes bundled with the default windows chromedriver executable. On Max / Linux, download the right distribution for you from https://chromedriver.chromium.org/downloads and place the executable in this folder. Alternatively, use the `--chromedriver` argument to specify its path at runtime
|
||||||
|
|
||||||
## Simple Usage
|
## Simple Usage
|
||||||
`python loconotion.py https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e`
|
`python loconotion.py https://www.notion.so/The-perfect-It-s-Always-Sunny-in-Philadelphia-episode-d08aaec2b24946408e8be0e9f2ae857e`
|
||||||
|
|
||||||
@ -44,15 +50,24 @@ You can fully configure Loconotion to your needs by passing a [.toml](https://gi
|
|||||||
|
|
||||||
Here's what a full configuration would look like, alongside with explanations for each parameter.
|
Here's what a full configuration would look like, alongside with explanations for each parameter.
|
||||||
```toml
|
```toml
|
||||||
|
## Loconotion Site Configuration File ##
|
||||||
|
# full .toml configuration example file to showcase all of Loconotion's available settings
|
||||||
|
# check out https://github.com/toml-lang/toml for more info on the toml format
|
||||||
|
|
||||||
|
# name of the folder that the site will be generated in
|
||||||
name = "Notion Test Site"
|
name = "Notion Test Site"
|
||||||
# the notion.so page to being parsing from. This page will become the index.html
|
# the notion.so page to being parsing from. This page will become the index.html
|
||||||
# of the generated site, and loconotation will parse all sub-pages present on the page.
|
# of the generated site, and loconotation will parse all sub-pages present on the page.
|
||||||
page = "https://www.notion.so/A-Notion-Page-03c403f4fdc94cc1b315b9469a8950ef"
|
page = "https://www.notion.so/A-Notion-Page-03c403f4fdc94cc1b315b9469a8950ef"
|
||||||
|
|
||||||
# this site table defines override settings for the whole site
|
## Global Site Settings ##
|
||||||
|
# this [site] table defines override settings for the whole site
|
||||||
# later on we will see how to define settings for a single page
|
# later on we will see how to define settings for a single page
|
||||||
[site]
|
[site]
|
||||||
## custom meta tags ##
|
## Custom Meta Tags ##
|
||||||
|
# defined as an array of tables (double square brackets)
|
||||||
|
# each key in the table maps to an atttribute in the tag
|
||||||
|
# the following adds the tag <meta name="title" content="Loconotion Test Site"/>
|
||||||
[[site.meta]]
|
[[site.meta]]
|
||||||
name = "title"
|
name = "title"
|
||||||
content = "Loconotion Test Site"
|
content = "Loconotion Test Site"
|
||||||
@ -60,10 +75,10 @@ page = "https://www.notion.so/A-Notion-Page-03c403f4fdc94cc1b315b9469a8950ef"
|
|||||||
name = "description"
|
name = "description"
|
||||||
content = "A static site generated from a Notion.so page using Loconotion"
|
content = "A static site generated from a Notion.so page using Loconotion"
|
||||||
|
|
||||||
## custom site fonts ##
|
## Custom Fonts ##
|
||||||
# you can specify the name of a google font to use on the site, use the font embed name
|
# you can specify the name of a google font to use on the site - use the font embed name
|
||||||
# (if in doubt select a style on fonts.google.com and navigate to the "embed" tag to check the name under CSS rules)
|
# if in doubt select a style on fonts.google.com and navigate to the "embed" tag to check the name under CSS rules
|
||||||
# keys controls the font of the following elements:
|
# the table keys controls the font of the following elements:
|
||||||
# site: changes the font for the whole page (apart from code blocks) but the following settings override it
|
# site: changes the font for the whole page (apart from code blocks) but the following settings override it
|
||||||
# navbar: site breadcrumbs on the top-left of the page
|
# navbar: site breadcrumbs on the top-left of the page
|
||||||
# title: page title (under the icon)
|
# title: page title (under the icon)
|
||||||
@ -73,19 +88,19 @@ page = "https://www.notion.so/A-Notion-Page-03c403f4fdc94cc1b315b9469a8950ef"
|
|||||||
# body: non-heading text on the page
|
# body: non-heading text on the page
|
||||||
# code: text inside code blocks
|
# code: text inside code blocks
|
||||||
[site.fonts]
|
[site.fonts]
|
||||||
site = 'Roboto'
|
site = 'Lato'
|
||||||
navbar = ''
|
navbar = ''
|
||||||
title = 'Montserrat'
|
title = 'Montserrat'
|
||||||
h1 = 'Montserrat'
|
h1 = 'Montserrat'
|
||||||
h2 = 'Montserrat'
|
h2 = 'Montserrat'
|
||||||
h3 = ''
|
h3 = 'Montserrat'
|
||||||
body = ''
|
body = ''
|
||||||
code = ''
|
code = ''
|
||||||
|
|
||||||
## custom element injection ##
|
## Custom Element Injection ##
|
||||||
# 'head' or 'body' to set where the element will be injected
|
# defined as an array of tables [[site.inject]], followed by 'head' or 'body' to set where the injection point,
|
||||||
# the next dotted key represents the tag to inject, with the table values being the the tag attributes
|
# followed by name of the tag to inject. Each key in the table maps to an atttribute in the tag
|
||||||
# e.g. the following injects <link href="favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/> in the <head>
|
# the following injects <link href="favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/> in the <head>
|
||||||
[[site.inject.head.link]]
|
[[site.inject.head.link]]
|
||||||
rel="icon"
|
rel="icon"
|
||||||
sizes="16x16"
|
sizes="16x16"
|
||||||
@ -97,33 +112,37 @@ page = "https://www.notion.so/A-Notion-Page-03c403f4fdc94cc1b315b9469a8950ef"
|
|||||||
type="text/javascript"
|
type="text/javascript"
|
||||||
src="/example/custom-script.js"
|
src="/example/custom-script.js"
|
||||||
|
|
||||||
## individual page settings ##
|
## Individual Page Settings ##
|
||||||
# while the [site] table applies the settings to all parse pages,
|
# the [pages] table defines override settings for individual pages, by defining a sub-table named after the page url
|
||||||
# it's possible to override a single page's setting by defining
|
# (or part of the url, but careful into not use a string that appears in multiple page urls)
|
||||||
# a table named after the page url or part of it.
|
[pages]
|
||||||
#
|
# the following settings will only apply to this page: https://www.notion.so/d2fa06f244e64f66880bb0491f58223d
|
||||||
# e.g the following settings will only apply to this parsed page:
|
[pages.d2fa06f244e64f66880bb0491f58223d]
|
||||||
# https://www.notion.so/d2fa06f244e64f66880bb0491f58223d
|
## custom slugs ##
|
||||||
[d2fa06f244e64f66880bb0491f58223d]
|
# inside page settings, you can change the url that page will map to with the 'slug' key
|
||||||
## custom slugs ##
|
# e.g. page "/d2fa06f244e64f66880bb0491f58223d" will now map to "/list"
|
||||||
# inside page settings, you can change the url that page will map to with the 'slug' key
|
slug = "list"
|
||||||
# e.g. page "/d2fa06f244e64f66880bb0491f58223d" will now map to "/list"
|
|
||||||
slug = "list"
|
|
||||||
|
|
||||||
[[d2fa06f244e64f66880bb0491f58223d.meta]]
|
# change the description meta tag for this page only
|
||||||
# change the description meta tag for this page only
|
[[pages.d2fa06f244e64f66880bb0491f58223d.meta]]
|
||||||
name = "description"
|
name = "description"
|
||||||
content = "A fullscreen list database page, now with a pretty slug"
|
content = "A fullscreen list database page, now with a pretty slug"
|
||||||
|
|
||||||
[d2fa06f244e64f66880bb0491f58223d.fonts]
|
# change the title font for this page only
|
||||||
# change the title font for this page only
|
[pages.d2fa06f244e64f66880bb0491f58223d.fonts]
|
||||||
title = 'Nunito'
|
title = 'Nunito'
|
||||||
|
|
||||||
|
# for smaller sets of settings you can use inline notation
|
||||||
|
# 2483a3a5c3fd445980c1adc8e550b552.slug = "gallery"
|
||||||
|
# 2604ce45890645c79f67d92833083fee.slug = "table"
|
||||||
|
# a28dba2e7a67448da52f2cd2c641407b.slug = "board"
|
||||||
```
|
```
|
||||||
|
|
||||||
On top of this, the script can take a few extra arguments:
|
On top of this, the script can take this optional arguments:
|
||||||
```
|
```
|
||||||
--clean Delete all previously cached files for the site before generating it
|
--clean Delete all previously cached files for the site before generating it
|
||||||
-v, --verbose Shows way more exciting facts in the output
|
-v, --verbose Shows way more exciting facts in the output
|
||||||
|
--single-page Don't parse sub-pages
|
||||||
```
|
```
|
||||||
|
|
||||||
## Roadmap / Features wishlist
|
## Roadmap / Features wishlist
|
||||||
|
90
example/example_site.toml
Normal file
90
example/example_site.toml
Normal file
@ -0,0 +1,90 @@
|
|||||||
|
## Loconotion Site Configuration File ##
|
||||||
|
# full .toml configuration example file to showcase all of Loconotion's available settings
|
||||||
|
# check out https://github.com/toml-lang/toml for more info on the toml format
|
||||||
|
|
||||||
|
# name of the folder that the site will be generated in
|
||||||
|
name = "Notion Test Site"
|
||||||
|
# the notion.so page to being parsing from. This page will become the index.html
|
||||||
|
# of the generated site, and loconotation will parse all sub-pages present on the page.
|
||||||
|
page = "https://www.notion.so/Loconotion-Example-Page-03c403f4fdc94cc1b315b9469a8950ef"
|
||||||
|
|
||||||
|
## Global Site Settings ##
|
||||||
|
# this [site] table defines override settings for the whole site
|
||||||
|
# later on we will see how to define settings for a single page
|
||||||
|
[site]
|
||||||
|
## Custom Meta Tags ##
|
||||||
|
# defined as an array of tables (double square brackets)
|
||||||
|
# each key in the table maps to an atttribute in the tag
|
||||||
|
# the following adds the tag <meta name="title" content="Loconotion Test Site"/>
|
||||||
|
[[site.meta]]
|
||||||
|
name = "title"
|
||||||
|
content = "Loconotion Test Site"
|
||||||
|
[[site.meta]]
|
||||||
|
name = "description"
|
||||||
|
content = "A static site generated from a Notion.so page using Loconotion"
|
||||||
|
|
||||||
|
## Custom Fonts ##
|
||||||
|
# you can specify the name of a google font to use on the site - use the font embed name
|
||||||
|
# if in doubt select a style on fonts.google.com and navigate to the "embed" tag to check the name under CSS rules
|
||||||
|
# the table keys controls the font of the following elements:
|
||||||
|
# site: changes the font for the whole page (apart from code blocks) but the following settings override it
|
||||||
|
# navbar: site breadcrumbs on the top-left of the page
|
||||||
|
# title: page title (under the icon)
|
||||||
|
# h1: heading blocks, and inline databases' titles
|
||||||
|
# h2: sub-heading blocks
|
||||||
|
# h3: sub-sub-heading blocks
|
||||||
|
# body: non-heading text on the page
|
||||||
|
# code: text inside code blocks
|
||||||
|
[site.fonts]
|
||||||
|
site = 'Lato'
|
||||||
|
navbar = ''
|
||||||
|
title = 'Montserrat'
|
||||||
|
h1 = 'Montserrat'
|
||||||
|
h2 = 'Montserrat'
|
||||||
|
h3 = 'Montserrat'
|
||||||
|
body = ''
|
||||||
|
code = ''
|
||||||
|
|
||||||
|
## Custom Element Injection ##
|
||||||
|
# defined as an array of tables [[site.inject]], followed by 'head' or 'body' to set where the injection point,
|
||||||
|
# followed by name of the tag to inject. Each key in the table maps to an atttribute in the tag
|
||||||
|
# the following injects <link href="favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/> in the <head>
|
||||||
|
[[site.inject.head.link]]
|
||||||
|
rel="icon"
|
||||||
|
sizes="16x16"
|
||||||
|
type="image/png"
|
||||||
|
href="/example/favicon-16x16.png"
|
||||||
|
|
||||||
|
# the following injects <script src="custom-script.js" type="text/javascript"></script> in the <body>
|
||||||
|
# note that all href / src files are copied to the root of the site folder regardless of their original path
|
||||||
|
[[site.inject.body.script]]
|
||||||
|
type="text/javascript"
|
||||||
|
src="/example/custom-script.js"
|
||||||
|
|
||||||
|
## Individual Page Settings ##
|
||||||
|
# the [pages] table defines override settings for individual pages, by defining a sub-table named after the page url
|
||||||
|
# (or part of the url, but careful into not use a string that appears in multiple page urls)
|
||||||
|
[pages]
|
||||||
|
# the following settings will only apply to this page: https://www.notion.so/d2fa06f244e64f66880bb0491f58223d
|
||||||
|
[pages.d2fa06f244e64f66880bb0491f58223d]
|
||||||
|
## custom slugs ##
|
||||||
|
# inside page settings, you can change the url that page will map to with the 'slug' key
|
||||||
|
# e.g. page "/d2fa06f244e64f66880bb0491f58223d" will now map to "/games-list"
|
||||||
|
slug = "games-list"
|
||||||
|
|
||||||
|
# change the description meta tag for this page only
|
||||||
|
[[pages.d2fa06f244e64f66880bb0491f58223d.meta]]
|
||||||
|
name = "description"
|
||||||
|
content = "A fullscreen list database page, now with a pretty slug"
|
||||||
|
|
||||||
|
# change the title font for this page only
|
||||||
|
[pages.d2fa06f244e64f66880bb0491f58223d.fonts]
|
||||||
|
title = 'Nunito'
|
||||||
|
|
||||||
|
# set up pretty slugs for the other database pages
|
||||||
|
[pages.54dab6011e604430a21dc477cb8e4e3a]
|
||||||
|
slug = "film-gallery"
|
||||||
|
[pages.2604ce45890645c79f67d92833083fee]
|
||||||
|
slug = "books-table"
|
||||||
|
[pages.ae0a85c527824a3a855b7f4d31f4e0fc]
|
||||||
|
slug = "random-board"
|
@ -64,3 +64,18 @@ for (let i = 0; i < collectionSearchBoxes.length; i++) {
|
|||||||
const collectionSearchBox = collectionSearchBoxes.item(i).parentElement;
|
const collectionSearchBox = collectionSearchBoxes.item(i).parentElement;
|
||||||
collectionSearchBox.style.display = "none";
|
collectionSearchBox.style.display = "none";
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const anchorLinks = document.querySelectorAll("a.loconotion-anchor-link");
|
||||||
|
for (let i = 0; i < anchorLinks.length; i++) {
|
||||||
|
const anchorLink = anchorLinks.item(i);
|
||||||
|
const id = anchorLink.getAttribute("href").replace("#", "");
|
||||||
|
const targetBlockId =
|
||||||
|
id.slice(0, 8) + "-" + id.slice(8, 12) + "-" + id.slice(12, 16) + "-" + id.slice(16, 20) + "-" + id.slice(20);
|
||||||
|
anchorLink.addEventListener("click", (e) => {
|
||||||
|
e.preventDefault();
|
||||||
|
document.querySelector(targetBlockId).scrollIntoView({
|
||||||
|
behavior: "smooth",
|
||||||
|
block: "start",
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
178
loconotion.py
178
loconotion.py
@ -1,4 +1,5 @@
|
|||||||
import os
|
import os
|
||||||
|
import platform
|
||||||
import sys
|
import sys
|
||||||
import shutil
|
import shutil
|
||||||
import time
|
import time
|
||||||
@ -12,21 +13,25 @@ import hashlib
|
|||||||
import argparse
|
import argparse
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from selenium import webdriver
|
|
||||||
from selenium.webdriver.chrome.options import Options
|
|
||||||
from selenium.common.exceptions import TimeoutException, NoSuchElementException
|
|
||||||
from selenium.webdriver.support import expected_conditions as EC
|
|
||||||
from selenium.webdriver.common.by import By
|
|
||||||
from selenium.webdriver.support.ui import WebDriverWait
|
|
||||||
from bs4 import BeautifulSoup
|
|
||||||
|
|
||||||
import requests
|
|
||||||
import toml
|
|
||||||
import cssutils
|
|
||||||
cssutils.log.setLevel(logging.CRITICAL) # removes warning logs from cssutils
|
|
||||||
|
|
||||||
log = logging.getLogger(__name__)
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
try:
|
||||||
|
from selenium import webdriver
|
||||||
|
from selenium.webdriver.chrome.options import Options
|
||||||
|
from selenium.common.exceptions import TimeoutException, NoSuchElementException
|
||||||
|
from selenium.webdriver.support import expected_conditions as EC
|
||||||
|
from selenium.webdriver.common.by import By
|
||||||
|
from selenium.webdriver.support.ui import WebDriverWait
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
|
||||||
|
import requests
|
||||||
|
import toml
|
||||||
|
import cssutils
|
||||||
|
cssutils.log.setLevel(logging.CRITICAL) # removes warning logs from cssutils
|
||||||
|
except ModuleNotFoundError as error:
|
||||||
|
log.critical(f"ModuleNotFoundError: {error}. have your installed the requirements?")
|
||||||
|
sys.exit()
|
||||||
|
|
||||||
class notion_page_loaded(object):
|
class notion_page_loaded(object):
|
||||||
"""An expectation for checking that a notion page has loaded.
|
"""An expectation for checking that a notion page has loaded.
|
||||||
"""
|
"""
|
||||||
@ -167,29 +172,34 @@ class Parser():
|
|||||||
if not matching_file:
|
if not matching_file:
|
||||||
# if url has a network scheme, download the file
|
# if url has a network scheme, download the file
|
||||||
if "http" in urllib.parse.urlparse(url).scheme:
|
if "http" in urllib.parse.urlparse(url).scheme:
|
||||||
# Disabling proxy speeds up requests time
|
try:
|
||||||
# https://stackoverflow.com/questions/45783655/first-https-request-takes-much-more-time-than-the-rest
|
# Disabling proxy speeds up requests time
|
||||||
# https://stackoverflow.com/questions/28521535/requests-how-to-disable-bypass-proxy
|
# https://stackoverflow.com/questions/45783655/first-https-request-takes-much-more-time-than-the-rest
|
||||||
session = requests.Session()
|
# https://stackoverflow.com/questions/28521535/requests-how-to-disable-bypass-proxy
|
||||||
session.trust_env = False
|
session = requests.Session()
|
||||||
log.info(f"Downloading '{url}'")
|
session.trust_env = False
|
||||||
response = session.get(url)
|
log.info(f"Downloading '{url}'")
|
||||||
|
response = session.get(url)
|
||||||
|
|
||||||
# if the filename does not have an extension at this point,
|
# if the filename does not have an extension at this point,
|
||||||
# try to infer it from the url, and if not possible,
|
# try to infer it from the url, and if not possible,
|
||||||
# from the content-type header mimetype
|
# from the content-type header mimetype
|
||||||
if (not destination.suffix):
|
if (not destination.suffix):
|
||||||
file_extension = Path(urllib.parse.urlparse(url).path).suffix
|
file_extension = Path(urllib.parse.urlparse(url).path).suffix
|
||||||
if (not file_extension):
|
if (not file_extension):
|
||||||
content_type = response.headers.get('content-type')
|
content_type = response.headers.get('content-type')
|
||||||
file_extension = mimetypes.guess_extension(content_type)
|
if (content_type):
|
||||||
destination = destination.with_suffix(file_extension)
|
file_extension = mimetypes.guess_extension(content_types)
|
||||||
|
destination = destination.with_suffix(file_extension)
|
||||||
|
|
||||||
Path(destination).parent.mkdir(parents=True, exist_ok=True)
|
Path(destination).parent.mkdir(parents=True, exist_ok=True)
|
||||||
with open(destination, "wb") as f:
|
with open(destination, "wb") as f:
|
||||||
f.write(response.content)
|
f.write(response.content)
|
||||||
|
|
||||||
return destination.relative_to(self.dist_folder)
|
return destination.relative_to(self.dist_folder)
|
||||||
|
except Exception as error:
|
||||||
|
log.error(f"Error downloading file '{url}': {error}")
|
||||||
|
return url
|
||||||
# if not, check if it's a local file, and copy it to the dist folder
|
# if not, check if it's a local file, and copy it to the dist folder
|
||||||
else:
|
else:
|
||||||
if Path(url).is_file():
|
if Path(url).is_file():
|
||||||
@ -202,10 +212,24 @@ class Parser():
|
|||||||
cached_file = Path(matching_file[0]).relative_to(self.dist_folder)
|
cached_file = Path(matching_file[0]).relative_to(self.dist_folder)
|
||||||
log.debug(f"'{url}' was already downloaded")
|
log.debug(f"'{url}' was already downloaded")
|
||||||
return cached_file
|
return cached_file
|
||||||
# if all fails, return the original url
|
|
||||||
return url
|
|
||||||
|
|
||||||
def init_chromedriver(self):
|
def init_chromedriver(self):
|
||||||
|
exec_extension = ".exe" if platform.system() == "Windows" else ""
|
||||||
|
chromedriver_path = Path.cwd() / self.args.get("chromedriver")
|
||||||
|
|
||||||
|
# add the .exe extension on Windows if omitted
|
||||||
|
if (not chromedriver_path.suffix):
|
||||||
|
chromedriver_path = chromedriver_path.with_suffix(exec_extension)
|
||||||
|
|
||||||
|
# check the chromedriver executable exists
|
||||||
|
if (not chromedriver_path.is_file()):
|
||||||
|
log.critical(f"Chromedriver not found at {chromedriver_path}." +
|
||||||
|
" Download the correct distribution at https://chromedriver.chromium.org/downloads")
|
||||||
|
sys.exit()
|
||||||
|
|
||||||
|
logs_path = (Path.cwd() / "logs" / "webdrive.log")
|
||||||
|
logs_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
log.info("Initialising chrome driver")
|
log.info("Initialising chrome driver")
|
||||||
chrome_options = Options()
|
chrome_options = Options()
|
||||||
chrome_options.add_argument("--headless")
|
chrome_options.add_argument("--headless")
|
||||||
@ -213,11 +237,11 @@ class Parser():
|
|||||||
chrome_options.add_argument("--log-level=3");
|
chrome_options.add_argument("--log-level=3");
|
||||||
chrome_options.add_argument("--silent");
|
chrome_options.add_argument("--silent");
|
||||||
chrome_options.add_argument("--disable-logging")
|
chrome_options.add_argument("--disable-logging")
|
||||||
# removes the 'DevTools listening' log message
|
# removes the 'DevTools listening' log message
|
||||||
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])
|
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])
|
||||||
return webdriver.Chrome(
|
return webdriver.Chrome(
|
||||||
executable_path=str(Path.cwd() / "bin" / "chromedriver.exe"),
|
executable_path=str(chromedriver_path),
|
||||||
service_log_path=str(Path.cwd() / "webdrive.log"),
|
service_log_path=str(logs_path),
|
||||||
options=chrome_options)
|
options=chrome_options)
|
||||||
|
|
||||||
def parse_page(self, url, processed_pages = {}, index = None):
|
def parse_page(self, url, processed_pages = {}, index = None):
|
||||||
@ -429,8 +453,18 @@ class Parser():
|
|||||||
for a in soup.findAll('a'):
|
for a in soup.findAll('a'):
|
||||||
if a['href'].startswith('/'):
|
if a['href'].startswith('/'):
|
||||||
sub_page_href = 'https://www.notion.so' + a['href']
|
sub_page_href = 'https://www.notion.so' + a['href']
|
||||||
|
# if the link is an anchor link, check if the page hasn't already been parsed
|
||||||
|
if ("#" in sub_page_href):
|
||||||
|
sub_page_href_tokens = sub_page_href.split("#")
|
||||||
|
sub_page_href = sub_page_href_tokens[0]
|
||||||
|
a['href'] = "#" + sub_page_href_tokens[-1]
|
||||||
|
a['class'] = a.get('class', []) + ['loconotion-anchor-link']
|
||||||
|
if (sub_page_href in processed_pages.keys() or sub_page_href in sub_pages):
|
||||||
|
log.debug(f"Original page for anchor link {sub_page_href} already parsed / pending parsing, skipping")
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
a['href'] = self.get_page_slug(sub_page_href) if sub_page_href != index else "index.html"
|
||||||
sub_pages.append(sub_page_href)
|
sub_pages.append(sub_page_href)
|
||||||
a['href'] = self.get_page_slug(sub_page_href) if sub_page_href != index else "index.html"
|
|
||||||
log.debug(f"Found link to page {a['href']}")
|
log.debug(f"Found link to page {a['href']}")
|
||||||
|
|
||||||
# exports the parsed page
|
# exports the parsed page
|
||||||
@ -446,7 +480,7 @@ class Parser():
|
|||||||
|
|
||||||
# parse sub-pages
|
# parse sub-pages
|
||||||
if (sub_pages and not self.args.get("single_page", False)):
|
if (sub_pages and not self.args.get("single_page", False)):
|
||||||
if (processed_pages): log.debug(f"Pages processed so far: {processed_pages}")
|
if (processed_pages): log.debug(f"Pages processed so far: {len(processed_pages)}")
|
||||||
for sub_page in sub_pages:
|
for sub_page in sub_pages:
|
||||||
if not sub_page in processed_pages.keys():
|
if not sub_page in processed_pages.keys():
|
||||||
self.parse_page(sub_page, processed_pages = processed_pages, index = index)
|
self.parse_page(sub_page, processed_pages = processed_pages, index = index)
|
||||||
@ -465,44 +499,48 @@ if __name__ == '__main__':
|
|||||||
# set up argument parser
|
# set up argument parser
|
||||||
parser = argparse.ArgumentParser(description='Generate static websites from Notion.so pages')
|
parser = argparse.ArgumentParser(description='Generate static websites from Notion.so pages')
|
||||||
parser.add_argument('target', help='The config file containing the site properties, or the url of the Notion.so page to generate the site from')
|
parser.add_argument('target', help='The config file containing the site properties, or the url of the Notion.so page to generate the site from')
|
||||||
|
parser.add_argument('--chromedriver', default='bin/chromedriver', help='Path to the chromedriver executable')
|
||||||
|
parser.add_argument("--single-page", action="store_true", default=False, help="Only parse the first page, then stop")
|
||||||
parser.add_argument('--clean', action='store_true', default=False, help='Delete all previously cached files for the site before generating it')
|
parser.add_argument('--clean', action='store_true', default=False, help='Delete all previously cached files for the site before generating it')
|
||||||
parser.add_argument("-v", "--verbose", action="store_true", help="Shows way more exciting facts in the output")
|
parser.add_argument("-v", "--verbose", action="store_true", help="Shows way more exciting facts in the output")
|
||||||
parser.add_argument("--single-page", action="store_true", help="Don't parse sub-pages")
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
# set up some pretty logs
|
# set up some pretty logs
|
||||||
import colorama, copy
|
|
||||||
|
|
||||||
LOG_COLORS = {
|
|
||||||
logging.DEBUG: colorama.Fore.GREEN,
|
|
||||||
logging.INFO: colorama.Fore.BLUE,
|
|
||||||
logging.WARNING: colorama.Fore.YELLOW,
|
|
||||||
logging.ERROR: colorama.Fore.RED,
|
|
||||||
logging.CRITICAL: colorama.Back.RED
|
|
||||||
}
|
|
||||||
|
|
||||||
class ColorFormatter(logging.Formatter):
|
|
||||||
def format(self, record, *args, **kwargs):
|
|
||||||
# if the corresponding logger has children, they may receive modified
|
|
||||||
# record, so we want to keep it intact
|
|
||||||
new_record = copy.copy(record)
|
|
||||||
if new_record.levelno in LOG_COLORS:
|
|
||||||
new_record.levelname = "{color_begin}{level}{color_end}".format(
|
|
||||||
level=new_record.levelname,
|
|
||||||
color_begin=LOG_COLORS[new_record.levelno],
|
|
||||||
color_end=colorama.Style.RESET_ALL,
|
|
||||||
)
|
|
||||||
return super(ColorFormatter, self).format(new_record, *args, **kwargs)
|
|
||||||
|
|
||||||
log_screen_handler = logging.StreamHandler(stream=sys.stdout)
|
|
||||||
log_screen_handler.setFormatter(ColorFormatter(fmt='%(asctime)s %(levelname)-8s %(message)s',
|
|
||||||
datefmt="{color_begin}[%H:%M:%S]{color_end}".format(
|
|
||||||
color_begin=colorama.Style.DIM,
|
|
||||||
color_end=colorama.Style.RESET_ALL
|
|
||||||
)))
|
|
||||||
log = logging.getLogger(__name__)
|
log = logging.getLogger(__name__)
|
||||||
log.setLevel(logging.INFO if not args.verbose else logging.DEBUG)
|
log.setLevel(logging.INFO if not args.verbose else logging.DEBUG)
|
||||||
|
log_screen_handler = logging.StreamHandler(stream=sys.stdout)
|
||||||
log.addHandler(log_screen_handler)
|
log.addHandler(log_screen_handler)
|
||||||
|
try:
|
||||||
|
import colorama, copy
|
||||||
|
|
||||||
|
LOG_COLORS = {
|
||||||
|
logging.DEBUG: colorama.Fore.GREEN,
|
||||||
|
logging.INFO: colorama.Fore.BLUE,
|
||||||
|
logging.WARNING: colorama.Fore.YELLOW,
|
||||||
|
logging.ERROR: colorama.Fore.RED,
|
||||||
|
logging.CRITICAL: colorama.Back.RED
|
||||||
|
}
|
||||||
|
|
||||||
|
class ColorFormatter(logging.Formatter):
|
||||||
|
def format(self, record, *args, **kwargs):
|
||||||
|
# if the corresponding logger has children, they may receive modified
|
||||||
|
# record, so we want to keep it intact
|
||||||
|
new_record = copy.copy(record)
|
||||||
|
if new_record.levelno in LOG_COLORS:
|
||||||
|
new_record.levelname = "{color_begin}{level}{color_end}".format(
|
||||||
|
level=new_record.levelname,
|
||||||
|
color_begin=LOG_COLORS[new_record.levelno],
|
||||||
|
color_end=colorama.Style.RESET_ALL,
|
||||||
|
)
|
||||||
|
return super(ColorFormatter, self).format(new_record, *args, **kwargs)
|
||||||
|
|
||||||
|
log_screen_handler.setFormatter(ColorFormatter(fmt='%(asctime)s %(levelname)-8s %(message)s',
|
||||||
|
datefmt="{color_begin}[%H:%M:%S]{color_end}".format(
|
||||||
|
color_begin=colorama.Style.DIM,
|
||||||
|
color_end=colorama.Style.RESET_ALL
|
||||||
|
)))
|
||||||
|
except ModuleNotFoundError as identifier:
|
||||||
|
pass
|
||||||
|
|
||||||
# parse the provided arguments
|
# parse the provided arguments
|
||||||
try:
|
try:
|
||||||
|
@ -1,76 +0,0 @@
|
|||||||
name = "Notion Test Site"
|
|
||||||
# the notion.so page to being parsing from. This page will become the index.html
|
|
||||||
# of the generated site, and loconotation will parse all sub-pages present on the page.
|
|
||||||
page = "https://www.notion.so/A-Notion-Page-03c403f4fdc94cc1b315b9469a8950ef"
|
|
||||||
|
|
||||||
# this site table defines override settings for the whole site
|
|
||||||
# later on we will see how to define settings for a single page
|
|
||||||
[site]
|
|
||||||
## custom meta tags ##
|
|
||||||
[[site.meta]]
|
|
||||||
name = "title"
|
|
||||||
content = "Loconotion Test Site"
|
|
||||||
[[site.meta]]
|
|
||||||
name = "description"
|
|
||||||
content = "A static site generated from a Notion.so page using Loconotion"
|
|
||||||
|
|
||||||
## custom site fonts ##
|
|
||||||
# you can specify the name of a google font to use on the site, use the font embed name
|
|
||||||
# (if in doubt select a style on fonts.google.com and navigate to the "embed" tag to check the name under
|
|
||||||
# CSS rules)
|
|
||||||
# keys controls the font of the following elements:
|
|
||||||
# site: changes the font for the whole page (apart from code blocks) but the following settings override it
|
|
||||||
# navbar: site breadcrumbs on the top-left of the page
|
|
||||||
# title: page title (under the icon)
|
|
||||||
# h1: heading blocks, and inline databases' titles
|
|
||||||
# h2: sub-heading blocks
|
|
||||||
# h3: sub-sub-heading blocks
|
|
||||||
# body: non-heading text on the page
|
|
||||||
# code: text inside code blocks
|
|
||||||
[site.fonts]
|
|
||||||
site = 'Roboto'
|
|
||||||
navbar = ''
|
|
||||||
title = 'Montserrat'
|
|
||||||
h1 = 'Montserrat'
|
|
||||||
h2 = 'Montserrat'
|
|
||||||
h3 = ''
|
|
||||||
body = ''
|
|
||||||
code = ''
|
|
||||||
|
|
||||||
## custom element injection ##
|
|
||||||
# 'head' or 'body' to set where the element will be injected
|
|
||||||
# the next dotted key represents the tag to inject, with the table values being the the tag attributes
|
|
||||||
# e.g. the following injects <link href="favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/> in the <head>
|
|
||||||
[[site.inject.head.link]]
|
|
||||||
rel="icon"
|
|
||||||
sizes="16x16"
|
|
||||||
type="image/png"
|
|
||||||
href="/example/favicon-16x16.png"
|
|
||||||
|
|
||||||
# the following injects <script src="custom-script.js" type="text/javascript"></script> in the <body>
|
|
||||||
[[site.inject.body.script]]
|
|
||||||
type="text/javascript"
|
|
||||||
src="/example/custom-script.js"
|
|
||||||
|
|
||||||
## individual page settings ##
|
|
||||||
# while the [site] table applies the settings to all parse pages,
|
|
||||||
# it's possible to override a single page's setting by defining
|
|
||||||
# a table named after the page url or part of it.
|
|
||||||
#
|
|
||||||
# e.g the following settings will only apply to this parsed page:
|
|
||||||
# https://www.notion.so/d2fa06f244e64f66880bb0491f58223d
|
|
||||||
[pages]
|
|
||||||
[pages.d2fa06f244e64f66880bb0491f58223d]
|
|
||||||
## custom slugs ##
|
|
||||||
# inside page settings, you can change the url that page will map to with the 'slug' key
|
|
||||||
# e.g. page "/d2fa06f244e64f66880bb0491f58223d" will now map to "/list"
|
|
||||||
slug = "list"
|
|
||||||
|
|
||||||
[[pages.d2fa06f244e64f66880bb0491f58223d.meta]]
|
|
||||||
# change the description meta tag for this page only
|
|
||||||
name = "description"
|
|
||||||
content = "A fullscreen list database page, now with a pretty slug"
|
|
||||||
|
|
||||||
[pages.d2fa06f244e64f66880bb0491f58223d.fonts]
|
|
||||||
# change the title font for this page only
|
|
||||||
title = 'Nunito'
|
|
Loading…
Reference in New Issue
Block a user