[release_table] Improve script (#305)

- Add strict typing to the fields. This makes the script fail if some column does not have the expected type (for example because of a change in the HTML page).
- Support regex and templating for all fields (not only the releaseCycle). This make it possible to extract only the necessary information without having to do some sort of 'magic' cleanup (replacements in dates have been reverted).
- Do not inject 'releaseCycle' anymore in the JSON (there is already the name).
This commit is contained in:
Marc Wrobel
2024-02-14 23:40:10 +01:00
parent c6881fef43
commit a801200c11
2 changed files with 80 additions and 28 deletions

View File

@@ -43,16 +43,10 @@ def parse_datetime(text: str, formats: list[str] = frozenset([
# so that we don't have to deal with some special cases in formats
text = (
text.strip()
.replace("th, ", " ") # November 10th, 2015 -> November 10, 2015
.replace("st, ", " ") # March 31st, 2015 -> March 31, 2015
.replace("Augu ", "August ") # 17 Augu 2023 -> 17 August 2023 - revert after st replacement
.replace("augu ", "August ") # 17 Augu 2023 -> 17 august 2023 - revert after st replacement
.replace("rd, ", " ") # March 3rd, 2015 -> March 3, 2015
.replace(", ", " ") # November 10, 2015 -> November 10 2015
.replace(". ", " ") # November 10. 2015 -> November 10 2015
.replace("(", "") # (November 10 2015) -> November 10 2015)
.replace(")", "") # (November 10 2015) -> (November 10 2015
.replace("*", "") # November 10 2015* -> November 10 2015
)
for fmt in formats:
try: