Every community I care about is dead

  • 1 Post
  • 23 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle




  • Everyone fully missing the point here. This is the banner image for !linux@programming.dev (that’s not where we are right now for the record), and it has a normal JPEG size of 7.7MB. When it’s served as WebP it’s 3.8MB. OP is correct that this is very stupid and wasteful for a web content image. It’s a triple-monitor 1440p wallpaper that’s used verbatim, and it should instead be compressed down to be bandwidth-friendly. I was able to get it to 1.4MB at JPEG quality 80, and when swapping it out in dev tools and performing A/B testing I can’t tell the difference. This should be brought to the attention of a mod on that community so it can stop sucking people’s data for no reason.


  • JXL is the best image codec we have so far and it’s not even close. I did a breakdown on some of its benefits here. JXL can losslessly convert PNG, JPG, and GIF into itself, and can losslessly send them back the other way too. The main downside is that Google has been blocking its adoption by keeping support out of Chromium in favor of pushing AVIF, which started a chicken and egg problem of no one wanting to use it until everyone else started using it too. If you want to be an early adopter you can feel free to use JXL, but just know that 3rd party software support is still maturing.

    Something you might find interesting is that the original JPEG is such a badass format that they’ve taken a lot of their findings from JXL and made a badass JPEG encoder with it named jpegli. Oddly, jpegli-based JPEGs are not yet able to be losslessly-compressed into JXL files, per this issue - hopefully that will be fixed at some point.





  • Better content by a mile if you’re in some of the good ones. Speeds are always good on private trackers but public trackers can have good speeds on popular things too.

    Some stats (I edited the name into each one so it would be easier to differentiate):

    • RED (1st largest music)
    • OPS (2nd largest music)
    • BTN (Television)
    • PTP (Movies)
    • HDB (All HD content)
    • GGn (Video games)

    Better privacy is objectively true but it’s only security by obscurity, so it’s ~99.9% bulletproof. You can generally torrent without a VPN on private trackers, though some of the larger “semi-private” mega trackers like IPT/TL/etc may still have a tiny chance of getting ISP letters. For a copyright holder to serve your ISP a letter they need to be in your swarm with you. Private trackers are difficult to get access to and have much smaller swarms than public trackers, so copyright holders usually don’t bother. The easier a community is to get access to, and the more users that are hanging around in swarms, the more appetizing it might be for them.

    Edit: Retention is also really good on private trackers. Most people seed everything forever on private trackers, so you don’t have to worry about content being “fresh” for it to be available and seeded.


  • Mirrored vdevs allow growth by adding a pair at a time, yes. Healing works with mirrors, because each of the two disks in a mirror are supposed to have the same data as each other. When a read or scrub happens, if there’s any checksum failures it will replace the failed block on Disk1 with Disk2’s copy of that block.

    Many ZFS’ers swear by mirrored vdevs because they give you the best performance, they’re more flexible, and resilvering from a failed mirror disk is an order of magnitude faster than resilvering from a failed RAIDZ - leaving less time for a second disk failure. The big downside is that they eat 50% of your disk capacity. I personally run mirrored vdevs because it’s more flexible for a small home NAS, and I make up for some of the disk inefficiency by being able to buy any-size disks on sale and throw them in whenever I see a good price.


  • The main problem with self-healing is that ZFS needs to have access to two copies of data, usually solved by having 2+ disks. When you expose an mdadm device ZFS will only perceive one disk and one copy of data, so it won’t try to store 2 copies of data anywhere. Underneath, mdadm will be storing the two copies of data, so any healing would need to be handled by mdadm directly instead. ZFS normally auto-heals when it reads data and when it scrubs, but in this setup mdadm would need to start the healing process through whatever measures it has (probably just scrubbing?)


  • ZFS can grow if it has extra space on the disk. The obvious answer is that you should really be using RAIDZ2 instead if you are going with ZFS, but I assume you don’t like the inflexibility of RAIDZ resizing. RAIDZ expansion has been merged into OpenZFS, but it will probably take a year or so to actually land in the next release. RAIDZ2 could still be an option if you aren’t planning on growing before it lands. I don’t have much experience with mdadm, but my guess is that with mdadm+ZFS, features like self-healing won’t work because ZFS isn’t aware of the RAID at a low-level. I would expect it to be slightly janky in a lot of ways compared to RAIDZ, and if you still want to try it you may become the foremost expert on the combination.


  • ZFS without redundancy is not great in the sense that redundancy is ideal in all scenarios, but it’s still a modern filesystem with a lot of good features, just like BTRFS. The main problem will be that it can detect data corruption but not heal it automatically. Transparent compression, snapshotting, data checksums, copy-on-write (power loss resiliency), and reflinking are modern features of both ZFS/BTRFS, and BTRFS additionally offers offline-deduplication, meaning you can deduplicate any data block that exists twice in your pool without incurring the massive resources that ZFS deduplication requires. ZFS is the more mature of the two, and I would use that if you’ve already got ZFS tooling set up on your machine.

    Note that the TrueNAS forums spread a lot of FUD about ZFS, but ZFS without redundancy is ok. I would take anything alarmist from there with a grain of salt. BTRFS and ZFS both store 2 copies of all metadata by default, so bitrot will be auto-healed on a filesystem level when it’s read or scrubbed.

    Edit: As for write amplification, just use ashift=12 and don’t worry too much about it.


  • ZFS doesn’t eat your SSD endurance. If anything it is the best option since you can enable ZSTD compression for smaller reads/writes and reads will often come from the RAM-based ARC cache instead of your SSDs. ZFS is also practically allergic to rewriting data that already exists in the pool, so once something is written it should never cost a write again - especially if you’re using OpenZFS 2.2 or above which has reflinking.

    My guess is you were reading about SLOG devices, which do need heavier endurance as they replicate every write coming into your HDD array (every synchronous write, anyway). SLOG devices are only useful in HDD pools, and even then they’re not a must-have.

    IMO just throw in whatever is cheapest or has your desired performance. Modern SSD write endurance is way better than it used to be and even if you somehow use it all up after a decade, the money you save by buying a cheaper one will pay for the replacement.

    I would also recommend using ZFS or BTRFS on the data drive, even without redundancy. These filesystems store checksums of all data so you know if anything has bitrot when you scrub it. XFS/Ext4/etc store your data but they have no idea if it’s still good or not.





  • With effort level 7 you should be getting images roughly 2/3’s of the size of the original PNG on average (assuming the PNG is already properly optimized). I would try again with at least effort levels 3, 4, 5, and 7. Also consider that PNGs need very expensive CPU time to properly compress them, using a tool like oxipng.

    What sort of balance are you looking for with regards to filesize and encode time? At the very least, effort levels 1 through 3 will probably still give you better results than PNG while being ridiculously quick, so there shouldn’t be any configuration where PNG is a better choice than JXL with regards to speed.


  • JXL has been ready for practical use for a while now - the only place where JXL support is still missing is browsers (due to Google’s politically-motivated removal from chromium). I’m not sure if anyone has tried using JXL with ML, but it’s certainly ready to be tested right now. IMO JXL has been ready since their libJXL 0.7.0 release, which happened September 2022 last year. They’re still working towards a 1.0 but every image-related application has built-in support for JXL already and it can more or less be considered ready.

    haven’t seen any major downsides, besides less optimal performance for very low resolution images

    Just to note here, to be precise AVIF starts (barely) winning at low fidelity ranges, not low resolution. Meaning if you want a blurry mess that looks like this, AVIF will compress slightly better (that’s an actual AVIF converted to PNG by the way).

    At the risk of sounding like sour grapes, this compression advantage doesn’t truly matter. This level of compression is almost never used, and even if it was, even drastic relative filesize savings would ultimately amount to bytes/kilobytes in the grand scheme of all images you’re serving. It’s more impactful to compress large images simply because they are larger. Smaller images are already small and efficiency deltas in a 1kB vs 1.1kB image are meaningless compared to a 600kB vs 800kB image.

    I also wonder if the support for progressive loading might be useful for more efficient, low resolution variants of high resolution models. Just store one set of high res images and load them in progressive steps to make smaller data sets. Like, say you have a bunch of 8k images, but you only want to make a website banner based on the model from those 8k res images. I wonder if it’s possible to use the the progressive loading support to halt reading in the images at 1k

    I’m not fully confident on this aspect but I’m pretty sure that JXL supports more than just traditional progressive decoding - you can actually pull “complete” images out of the bitstream from arbitrary ranges. Meaning you could efficiently store a full range of quality options in just one image, then serve them on the fly.

    Any time I see a big feature jump, like better file size, I assume the trade off in another feature negates at least half the benefit. It’s pretty rare, from what I’ve seen, to have improvements on all fronts.

    JXL is self-described “alien technology from the future”, and it was made by a “dream team” of image engineers who have had a hand in just about every image codec and compression technique from our past. It also benefits from being a real image codec, whereas every recent image format that has gained widespread adoption has been derived from a video codec (WebP, AVIF, HEIC).

    The only truly useful thing it doesn’t perform best-in-class at is animation encoding (losing to AVIF because it’s based on the amazing AV1 video codec), and I would honestly recommend just serving AV1 videos instead, and skipping image formats entirely.

    A neutral aspect of JXL is that it does worse in single-core decode speed compared to JPEG (which is disgustingly fast), but JXL can be parallelized whereas JPEG cannot. This is ultimately an advantage for JXL for general usecase where users have at least 4 cores available, but for large-scale distributed processing I imagine this property of JPEG may still have an edge use-case?

    If you’re curious about the technical aspects of JXL, I recommend reading their official slidedeck. The nitty-gritty details start at page 59, but the whole thing is a good read.


  • No one uses hardware decoding for images - it’s just not a good fit for the reality of how we use images. Images are small and easy to decode, whereas starting up a hardware decoder takes a non-trivial amount of time. Additionally, GPU decoders only work single-threaded, so each image would have to be decoded one by one, instead of all at once like with CPU decoding. This was already attempted with VP8/WebP and they gave up trying to make it any good. Videos are good candidates for hardware decoding since they’re large and you’re only looking at one at a time.

    If you have benchmarks or some proof showing otherwise by all means post here.