- cross-posted to:
- rust@programming.dev
- rust@lemmy.ml
- fediverse@lemmy.ml
- cross-posted to:
- rust@programming.dev
- rust@lemmy.ml
- fediverse@lemmy.ml
Anyone know:
How to rip a wiki from something like fandom and save it in a format that could be uploaded to this and
If that’s legal in the first place?
It’s legal if credit is given and it’s shared under CC-BY-SA.
https://www.fandom.com/licensing
Except where otherwise permitted, the text on Fandom communities (known as “wikis”) is licensed under the Creative Commons Attribution-Share Alike License 3.0 (Unported) (CC BY-SA).
you might find some inspiration from https://breezewiki.com/ - either its codebase directly or using it as an intermediary while scraping
Fandom offers database dumps in xml, see here https://community.fandom.com/wiki/Help:Database_download
- Paste each article’s raw source to ChatGPT, ask it to do it for you. If there are too many, you can automate it through the API for a negligible cost.
- Is it not.
Maybe also wget the website.
I’d be careful with using “ai.” Sometimes ChatGPT makes up answers even when you provide it with the data. -source it lies to me all the time
Converting from one format to another, it can do like gangbusters. I wouldn’t trust it to summarize stuff from its training data, it can do a little bit better with summarizing stuff you give it, but just mechanically finding the text and putting it verbatim into a different markup it’s pretty capable with.
Even reformatting has caused me issues. My best example is I gave it 100 citations in a non standardized format and asked for MLA. It returned 100 in MLA but randomly 10 of the books were made up. It decided to delete ten I sent at random and make them up instead of just giving me what I sent
Oh… yeah, you might have a point. Beyond a certain size of repeated things, it sometimes goes haywire, I’ve seen that.