Hacker Timesnew | past | comments | ask | show | jobs | submit | pandaforce's commentslogin

Most vision models are trained on images or conventional video codecs. There's a good reason why H200's have 7 JPEG + 7 nvdec ASICS.


That's basically what VC-2 is, as well as JPEG2000-HT. Wavelets are nice in that they have no need for deblocking since each slice complements one another, and they're simple to calculate. But this localization means they're horrible for compression since they have bad frequency decomposition properties. JPEG2000 showed that.


FFv1's range coder has higher complexity than CABAC. The issue is serialization. Mainstream codecs require that the a block depends on previously decoded blocks. Tiles exist, but they're so much larger, and so rarely used, that they may as well not exist.


6k ProRes streams that consumer cameras record in are still too heavy for modern CPUs to decode in realtime. Not to mention 12k ProRes that professional cameras output.


How do you figure? Have you tried? The CPU is required for IO. Deciding ProRes is pretty simple, that's why you can do it in a shader in the first place, and the CPU will already be touching every byte when you're using Vulkan.


Yes. I get 300fps decoding 8k ProRes on a 4090 and barely 50fps on a Zen 3 with all 16 cores running. The CPU doesn't touch anything, actually. We map the packet memory and let the GPU read it out directly via DMA. The data may be in a network device, or an SSD, and the GPU will still read it out directly. It's neat.


50fps sounds greater than 24fps, which is greater than realtime, no?

> We map the packet memory and let the GPU read it out directly via DMA

packets from where, exactly?

another POV is, all the complexity of what you are doing, all the salaries and whatever to do this prores thing you are doing, is competing with purchasing a $1,000 mac and plugging a cable in.

if your goal is, use a thing in the Apple ecosystem, the solution is, the Apple ecosystem. it isn't, create a shader for vulkan.

i didn't say that streaming prores 8k doesn't make sense. even if it were 60fps. i am saying that it doesn't make sense to do product development in figuring out how to decode this stuff in more ways.


The main target for this are NLEs like Blender. Performance is a large part of the issue. Most users still just create TIFF files per frame before importing them into a "real editor" like Resolve. Apple may have ASICs for ProRes decoding, and Resolve may be the standard editor that everyone uses.

But this goes beyond what even Apple has, by making it possible to work directly with compressed lossless video on consumer GPUs. You can get hundreds of FPS encoding or decoding 4k 16-bit FFv1 on a 4080, while only reading a few gigabits of video per second, rather than tens and even hundreds of gigabits that SSDs can't keep up. No need to have image degradation when passing intermediate copies between CG programs and editing either.


Yep! Almost finished implementing support in https://ossia.io which is going to become the first open-source cross-platform real-time visuals software to support live scrubbing for VJ use cases, in 4K+ prores files on not that big of a GPU (tested on my laptop 3060) :)


How to feed MilkDrop music visualizations?

(MilkDrop3, projectm-visualizer/presets-cream-of-the-crop, westurner/vizscan for photosensitive epilepsy)

mapmapteam/mapmap does open source multi-projector mapping. How to integrate e.g. mapmap?

BespokeSynth is a C++ and JUCE based patch bay software modular synth with a "node-based UI" and VST3, LV2, AudioUnit audio plugin support. How to feed BespokeSynth audio and possibly someday video? Pipewire and e.g. Helvum?


- MilkDrop: I'd love a PR that adds support for ProjectM :D it would be fairly easy to make a custom plug-in that just blits the texture.

Basic code for this would look like that:

    struct MilkdropIntegration
    {
      halp_meta(name, "ProjectM")
      halp_meta(c_name, "projectm")
      halp_meta(category, "Visuals")
      halp_meta(author, "ProjectM authors")
      halp_meta(description, " :) ")
      halp_meta(uuid, "417534da-3625-404a-b74f-91d003cb64b9")
    
      // By know you know the drill: define inputs, outputs...
      struct
      {    
        struct : halp::lineedit<"Program", "">
        {
          halp_meta(language, "eel2")
        } program;
      } inputs;
    
      struct
      {
        struct
        {
          halp_meta(name, "Out");
          halp::rgba_texture texture;
        } image;
      } outputs;
    
      halp::rgba_texture::uninitialized_bytes bytes;
    
      void operator()()
      {
        if(bytes.empty())    
          bytes = halp::rgba_texture::allocate(800, 600); // or whatever resolution you wanna set
          
        // Fill in bytes with your custom pixel data here
        
        outputs.image.texture.update(bytes.data(), 800, 600);
      }
    };
inside such a template: https://github.com/ossia-templates/score-avnd-simple-templat...

- multi-projector mapping: ossia actually does it directly! it's in git master, will be released in the next version. It also supports a fair amount of features that MapMap does not have such as:

* soft-edge blending

* blend modes

* custom polygons

* a proper HDR passthrough as well as tonemapping, etc.

* Metal, Vulkan, D3D11/12 support (mapmap is opengl-only)

* Spout, Syphon, NDI, soon pipewire video. Mapmap only supports camera input.

* HAP and DXV, both decoded on GPU.

* Smooth grid distortion. Here's mapmap grid distortion: https://streamable.com/1nhwxg vs ossia with sufficiently high subdivisions: https://streamable.com/hmb1jm

* And of course as mentioned here hw decoding (for some years already), the new feature adds zero-copy when for instance using vulkan video and the vulkan GPU backend.

* In addition pretty much every YUV pixel format in existence is GPU-decoded (https://github.com/ossia/score/tree/master/src/plugins/score...).

In contrast Mapmap does gstreamer -> Qt ; everything including the Yuv -> RGBA conversion goes through the CPU.

- How to feed BespokeSynth audio and possibly someday video? Pipewire and e.g. Helvum?

yes, pipewire (or jack or blackhole on windows and macOS). Although ossia also supports, VST, VST3, LV2, CLAP, JSFX, and Faust and comes with many audio effects built-in already.


I don’t understand the spread of thoughts in your post.

The reason to create image sequences is not because you need to send it to other apps, it’s because you preserve quality and safeguard from crashes.

A crash mid video write out can corrupt a lengthy render. With image sequences you only lose the current frame.

People aren’t going to stop using image sequences even if they stayed in the same app.

And I’m not sure why this applies: “this goes beyond” what Apple has, because they do have hardware support for decoding several compressed codecs (also I’ll note that ProRes is also compressed). Other than streaming, when are you going to need that kind of encode performance? Or what other codecs are you expecting will suddenly pop up by not requiring ASICs?

Also how does this remove degradation when going between apps? Are you envisioning this enables Blender to stream to an NLE without first writing a file to disk?


> A crash mid video write out can corrupt a lengthy render. With image sequences you only lose the current frame.

You wouldn't contain FFv1 in MP4, the only format incompetent enough for such corruption.

Apple has an interest against people using codecs that they get no fees from. And Apple don't have a lossless codec. So they don't offer lossless compressed video acceleration.

The idea is that when working as a part of a team, and you get handed a CG render, you can avoid sending a huge .tar or .zip file full of TIFF which you then decompress, or ProRes which loses quality, particularly when in a linear colorspace like ACEScg.


I’m curious what kind of teams you’re working in that you’re handing compressed archives of image sequences? And using tiff vs EXR (unless you mean purely after compositing)?

Another reason to use image sequences is that it’s easier to re-render just a portion of the sequence easily. Granted this can be done with video too, but has higher overhead.

But even then why does the GPU encoding change the fact that you’d send it to another NLE? I just feel like there are a lots of jump in thought process here.


I thought an industry standard was to use proxy files. Open source editor Shotcut use them for example. Create a low resolution + intra-frame only version of the file for very fast scrubbing, make your edits on that, and when done the edit list is applied to the full resolution rushes to produce the output.


Often but not always. Sometimes you’re just working with proxies directly, audio mixing and the like. VFX workflows, finishing will be online full res often.

But even so everybody is often making their own proxies all the time. There’s a lot of passing around of ProRes Proxy or another intermediate quality format and you still make even lighter proxies locally so NLEs and workstation apps will still benefit from this


Proxy files have issues when doing coloring, greenscreens, effects shots. The bit depth, chroma resolution, primaries/transfer/colorspace gets changed. Basically only really usable when editing. With this, you don't need proxy files at all.


Yes, but no. No, in that these days, GPUs are entirely scalar from the point of view of invocations. Using vectors in shaders is pointless - it will be as fast as scalar variables (double instruction dispatch on AMD GPUs is an exception).

But yes from the point of view that a collection of invocations all progressing in lockstep get arithmetic done by vector units. GPUs have just gotten really good at hiding what happens with branching paths between invocations.


Yeah, Vulkan is shedding most of the abstractions off. Buffers are no longer needed - just device addresses. Shaders don't need to be baked into a pipeline - you can use shader objects. Even images rarely provide any speedup advantages over buffers, since texel cache is no longer separate from memory cache.

GPUs these days have massive cache often hundreds of megabytes large, on top of an already absurd amount of registers. A random read will often load a full cacheline into a register and keep it there, reusing it as needed between invocations.


The article explicitly mentions that mainstream codecs like H264 are not the target. This is for very high bitrate high resolution professional codecs.


These are all gripes you might have with Vulkan Video. Unlike with Vulkan Video, in Compute, bounds checking is the norm. Overreading a regular buffer will not result in a GPU hang or crash. If you use pointers, it will, but if you use pointers, its up to you to check if overreads can happen.

The bitstream reader in FFmpeg for Vulkan Compute codecs is copied from the C code, along with bounds checking. The code which validates whether a block is corrupt or decodable is also taken from the C version. To date, I've never got a GPU hang while using the Compute codecs.


I wrote the Vulkan ProRes backend. The bitstream decoder was implemented from scratch, for a number of reasons.

First, the original code was reverse-engineered, before Apple published an SMPTE document describing the bitstream syntax. Second, I tried my best at optimizing the code for GPU hardware. And finally, I wanted take the learning opportunity :)

And to answer the parent's question, the shaders are written in pure GLSL. For instance, this is the ProRes bitstream decoder in question: https://code.ffmpeg.org/FFmpeg/FFmpeg/src/branch/master/liba...


glsl: this is the really bad part, as this is a definitive nono.

Should have been a plaind and simple C coded generator of SPIR-V byte code.


Khronos published a post on the Vulkan compute codecs in FFmpeg: https://www.khronos.org/blog/video-encoding-and-decoding-wit...



Is there a 'HW' guide to show the expected performance of Vulkan compute codecs anywhere?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: