• 0 Posts
  • 28 Comments
Joined 1 year ago
cake
Cake day: August 16th, 2023

help-circle




  • You think in Reddit’s 20 year history no one has thought of indexing comments for data science workloads?

    I’m sure they have, but an index doesn’t have anything to do with the python library you mentioned.

    Analytics workflows are never run on the production database, always on read replicas

    Sure, either that or aggregating live streams of data, but either way it doesn’t have anything to do with ElasticSearch.

    It’s still totally possible to sync things to ElasticSearch in a way that won’t affect performance on the production servers, but I’m just saying it’s not entirely trivial, especially at the scale reddit operates at, and there’s a cost for those extra servers and storage to consider as well.

    It’s hard for us to say if that math works out.

    It’s incredibly naive to think that they don’t have a vested interest in identifying organic engagement

    You would think, but you could say the same about Facebook and I know from experience that they don’t give a fuck about bots. If anything they actually like the bots because it looks like they have more users.


    1. To compare every comment on reddit to every other comment in reddit’s entire history would require an index, and if you want to find similar comments instead of exact matches, it becomes a lot harder to do that efficiently. ElasticSearch might be able to do it, but then you need to duplicate all of that data in a separate database and keep it in sync with your main database without affecting performance too much when people are leaving new comments, and that would probably be expensive.
    2. Comparing combinations of comments is probably impossible. Reddit has a massive number of comments to begin with, and the number of possible subtrees of those comments would just be absurd. If you only care about comparing entire threads and not subtrees, then this doesn’t apply, but I don’t know how useful that will be.
    3. Programmers just do what they’re told. If the managers don’t care about something, the programmers won’t work on it.








  • 40 have been on the site longer than 2 weeks. I can only filter for less than two weeks, not greater than, so I won’t bother with that, but here’s a summary of the first 5 that have a garage, aren’t pending, and are less than $300k:

    • “This house needs work and is priced accordingly”

    • “This is a 55 and older only community.”

    • A trailer with a detached garage for $250k. That’s just insulting.

    • A nice condo for $299,900, right where I said most houses would be.

    • A kind of ugly place for $230k. The description says it only needs cosmetic improvements, so this might actually be a good one to buy.

    So hey, in the first 5 there’s one house that seems reasonable. Maybe there are a few more if I go through the rest. Still, for a city of 200k people, being able to count the reasonably priced houses on your fingers is not very good.