• 28 Posts
  • 58 Comments
Joined 2 years ago
cake
Cake day: July 9th, 2023

help-circle


  • It all depends on how it’s represented on disk though and how the query is executed. Sqlite only supports numbers and strings, and if you keep using a VARCHAR, a read of those rows are going to have materialize a string into memory inside the sqlite library. DuckDB has more types, but if you’re using varchars everywhere, something has to read that string into memory unless you can push down logic into a query that doesn’t actually have to read the actual value, such as one that can use indices.

    The best way is to change the representation on disk, such as converting low-cardinality columns like the station into a numeric id. A standard int being four bytes is a lot more efficient than an n-byte string + a header and it can be compared by value.

    This is where file formats, like Parquet, shine. They’re oriented more towards parsing by systems. JSON is geared towards human parsing.





  • The companion post, I Went To SQL Injection Court, goes into detail about the court process and witness testimony. One of the interesting things is just how different computer people think about security vs lawyers. Somebody might say that having a schema would help a malicious actor a small amount, and a lawyer will jump on that to deny the request. The idea that the schema would help a malicious actor is the same as a map helping a bank robber. The vault security and security guards are the relevant factors for this, not the map.

    I’ll keep this in mind the next time I’m an expert witness in a computer case (based on this, I hope I’m not.)















  • Not all filtering is the same. Client side filtering requires more data to passed over the network that then just gets dropped. It also means rules that are not shared across devices.

    Most importantly, these use CSS filters which are computationally more expensive because it has to take an entire DOM element, serialize it to text, string search it vs a server side filter that can just look at a one or two field variables. Even if it’s not filtered in SQL on Lemmy’s side I’d say it’s still more efficient overall.

    You do what you want, but adding extra work on the client side is not what I’d want for my users. Of course, if your Lemmy instance does not supporting filtering, then this is moot.