• 2 Posts
  • 679 Comments
Joined 6 months ago
cake
Cake day: June 9th, 2024

help-circle



  • Search will never search non-local content.

    Which is the point I’m trying to make: right now, you cannot use search as a discoverability medium, unless you’re on something the scale of mastodon.social.

    Search with a focus on new content discoverability is utterly useless for smaller or single user instances, because a search that only finds things you already know about isn’t exactly a useful search for discoverability.

    If I have to be on the biggest instances, then there’s very little difference between something like Bluesky and Mastodon in terms of usability, and uh, I might as well pick the one that’s more likely to have the most growth and diversity of content.

    We have to give up on the idea of having easy and direct access to the whole of thw fediverse.

    I agree, and it’s why I’ve pretty much migrated back to centralized services with the exception of Lemmy, because Lemmy works very well in terms of finding useful shit to follow in a way that literally no other federated platform does.


  • Privacy regulations are all fine and dandy, but even with the strictest ones in place,

    They’re also subject to interpretation, regulatory capture, as well as just plain being ignored when it’s sufficiently convenient for the regulators to do so.

    “There ought to be a law!” is nice, but it’s not a solution when there’s a good couple of centuries of modern regulatory frameworks having had existed, and a couple centuries of endless examples of where absolutely none of it matters when sufficient money and power is in play.

    Like, for example, the GDPR: it made a lot of shit illegal under penalty of company-breaking penalties.

    So uh, nobody in the EU has had their personal data misused since it was passed? And all the big data brokers that are violating it have been fined out of business?

    And this is, of course, ignoring the itty bitty little fact that you have to be aware of the misuse of the data: if some dude does some shady shit quietly, then well, nobody knows it happened to even bring action?


  • How exactly are “communities offering services” a different thing than “hosted software”?

    I think what they’re saying is that the ideal wouldn’t be to force everyone to host their own, but rather for the people who want to run stuff to offer them to their friends and family.

    Kinda like how your mechanic neighbor sometimes helps you do shit on your car: one person shares a skill they have, and the other person also benefits. And then later your neighbor will ask you to babysit their kids, and shit.

    Basically: a very very goofy way of saying “Hey! Do nice things for your friends and family, because that’s kinda how life used to work.”


  • AI model of that type is safe to deploy anywhere

    Yeah, I think you’ve made a mistake in thinking that this is going to be usable as generative AI.

    I’d bet $5 this is just a fancy machine learning algorithm that takes a submitted image, does machine learning nonsense with it, and returns a ‘there is a high probability this is an illicit image of a child’, and not something you could use to actually generate CSAM with.

    You want something that’s capable of assessing the similarities between a submitted image and a group of known bad images, but that doesn’t mean the dataset is in any way usable for anything other than that one specific task - AI/ML in use cases like this is super broad and has been a thing for decades before the whole ‘AI == generative AI’ thing became what everyone is thinking.

    But, in any case: the PhotoDNA database is in one place and access to it is scaled by the merit of uh, lots of money?

    And of course, any ‘unscrupulous engineer’ that may have any plans for doing anything with this is probably not a complete idiot, even if a pedo: they’re going to have shockingly good access controls and logging and well, if you’re in the US, if the dude takes this database and generates a couple of CSAM images using it, the penalty is, for most people, spending the rest of their life in prison.

    Feds don’t fuck around with creation or distribution charges.


  • comparative scale of the content involved

    PhotoDNA is based on image hashes, as well as some magic that works on partial hashes: resizing the image, or changing the focus point, or fiddling with the color depth or whatever won’t break a PhotoDNA identification.

    But, of course, that means for PhotoDNA to be useful, the training set is literally ‘every CSAM image in existance’, so it’s not really like you’re training on a lot less data than an AI model would want or need.

    The big safeguard, such as it is, is that you basically only query an API with an image and it tells you if PhotoDNA has it in the database, so there’s no chance of the training data being shared.

    Of course, there’s also no reason you can’t do that with an AI model, either, and I’d be shocked if that’s not exactly how they’ve configured it.


  • The problem I ran into is that every single platform that primarily interacted with Mastodon (The keys, etc.) had the same exact same set of problems.

    While yes, my Firefish instance had search, what was it searching? Local data only, and once I figured out that Mastodon-style replies didn’t federate to all of someone’s followers, it became pretty clear that it was uh, not very useful.

    You can search, but any given server may or may not have access to data you actually want and thus, well, you just plain cannot meaningfully search for shit unless you go to one of the mega instances, or join giant piles of relays and store gigabyte upon gigabyte upon gigabyte of garbage data you do not care about.

    The whole implementation is kinda garbage for search-based discovery from it’s very basic design all the way through to everyone’s implementations.


  • first time law enforcement are sharing actual csam with a technology company

    It’s very much not: PhotoDNA, which is/was the gold standard for content identification, is a collaboration between a whole bunch of LEOs and Microsoft. The end user is only going to get a ‘yes/no idea’ result on a matched hash, but that database was built on real content working with Microsoft.

    Disclaimer: below is my experience dealing with this shit from ~2015-2020, so ymmv, take it with some salt, etc.

    Law enforcement is also rarely the first-responder to these issues, either: in the US, at least, reports will come to the hosting/service provider first for validation and THEN to NCMEC and LEOs, if the hosting provider confirms what the content is. Even reports that are sent from NCMEC to the provider aren’t being handled by law enforcement as the first step, usually.

    And as for validating reports, that’s done by looking at it without all the ‘access controls and safeguards’ you think there are, other than a very thin layer of CYA on the part of the company involved. You get a report, and once PhotoDNA says ‘no fucking clue, you figure it out’ (which, IME, was basically 90% of the time) a human is going to look at it and make a determination, and then file a report with NCMEC or whatever, if it turns out to be CSAM.

    Frankly, after having done that for far too fucking long, if this AI tool can reduce the amount of horrible shit someone doing the reviews has to look at, I’m 100% for it.

    CSAM is (grossly) a big business, and the ‘new content’ funnel is fucking enormous and is why an extremely delayed and reactive thing like PhotoDNA isn’t all that effective is that, well, there’s a fuckload of children being abused and a fuckload of abusers escaping being caught simply because there’s too much shit to look at and handle effectively and thus any response to anything is super super slow.

    This looks like a solution to make it so less people have to be involved in validation, and could be damn near instant in responding to suspected material that does need validation, which will do a good job of at least pushing the shit out of easy (ier?) availability and out of more public spaces, which honestly, is probably the best thing that is going to be managed unless the countries producing this shit start caring and going after the producers which I’m not holding my breath on.





  • Install it and use it?

    Their PDS is self hosted, but it does still rely on the central relays (though you COULD host that yourself if you wanted to pay for it, I suppose?).

    It’s very centralized, but it’s not that different from what you’d have to do to make Mastodon useful: a small/single user instance will get zero content, even if you follow a lot of people, without also adding several relays to work around some of the design decisions made by the Mastodon team regarding replies and how federation works for those kind of things, as well as to populate hashtags and searches and such.

    Though really you shouldn’t do any of that, and just use a good platform for discussion, like a forum or a threadiverse platform. (No seriously, absolutely hate “microblog” shit because it’s designed to just be zingers and hot takes and not actual meaningful conversations.)






  • That’s a wee revisionist: Zen/Zen+/Zen2 were not especially performant and Intel still ran circles around them with Coffee Lake chips, though in fairness that was probably because Zen forced them to stuff more cores on them.

    Zen3 and newer, though, yeah, Intel has been firmly in 2nd place or 1st place with asterisks.

    But the last 18 months has them fucking up in such a way that if you told me that they were doing it on purpose, I wouldn’t really doubt it.

    It’s not so much failing to execute well-conceived plans as it was shipping meltingly hot, sub-par performing chips that turned out to self-immolate, combined with also giving up on being their own fab, and THEN torching the relationship with TSMC before you launched your first products they’re fabbing.

    You could write the story as a malicious evil CEO wanting to destroy the company and it’d read much the same as what’s actually happening (not that I think Patty G is doing that, mind you) right now.


  • Yeah but it’s priced the same as a cheap laptop and/or desktop, which of course doesn’t then require you to pay monthly to actually use the stupid thing.

    It feels like another ‘Microsoft asked Microsoft what Microsoft management would buy, and came up with this’ product, and less one that actually has a substantial market, especially when you’re trying to sell a $350 box that costs you $x a month to actually use as a ‘business solution’.

    This would probably be a cool product at $0 with-a-required-contract-with-Azure, but at $350… meh, I suspect it’s a hard sale given the VDI stuff on Azure isn’t cheap.