WF16 Video Architecture: Difference between revisions
m (→HDMA) |
|||
| (4 intermediate revisions by the same user not shown) | |||
| Line 31: | Line 31: | ||
Advanced features would be to load a value from a table, offset by the raster number. And run the HDMA (with optional interrupt) every line until the target line is reached. | Advanced features would be to load a value from a table, offset by the raster number. And run the HDMA (with optional interrupt) every line until the target line is reached. | ||
There could be some BRAM dedicated to HDMA use or other on-chip variable storage. Would free up the bus, and take smaller indexing for where to copy from. Also, could set some state vars. | |||
The rasterline would likely be the graphics line, not the hi-res line but this could also be an option. | The rasterline would likely be the graphics line, not the hi-res line but this could also be an option. | ||
| Line 55: | Line 57: | ||
=== No-overdraw bandwidth reduction === | === No-overdraw bandwidth reduction === | ||
Have multiple hardware instances of layer renderers, all fighting for external bandwidth. Render front-to-back, with transparent pixels causing the next layer underneath to want to draw that pixel. Each layer only requests individual pixels that it needs to draw, and keeps some cache for redrawing the same sprite/tile on the same line. The first layer gets all pixels requested. The only wasted reads are when a read pixel is | Have multiple hardware instances of layer renderers, all fighting for external bandwidth. Render front-to-back, with transparent pixels causing the next layer underneath to want to draw that pixel. Each layer only requests individual pixels that it needs to draw, and keeps some cache for redrawing the same sprite/tile on the same line. The first layer gets all pixels requested. The only wasted reads are when a read pixel is transparent and dispatches deeper, and 16-bit wide reads where not all the bits are used. | ||
Tilemap data is probably fetched regardless | Tilemap data is probably fetched regardless if it's in DDR3. For SRAM, just grabbing & caching the current tile is fine. Tile 0 assumed blank means avoiding fetching any of those pixels. | ||
=== Alpha Transparency === | === Alpha Transparency === | ||
| Line 111: | Line 113: | ||
== Tiles == | == Tiles == | ||
Layer: Tilemap pointer, tilegfx pointer | Layer: Base pointer = Tilemap pointer, add tilegfx pointer | ||
Map size x/y tiles | Map size x/y tiles | ||
| Line 117: | Line 119: | ||
Option to wrap. | Option to wrap. | ||
By default, tile 0 is assumed empty and no pixels are read, saving bandwidth. Can be used for metadata. Option to have tile 0 non-empty? | By default, tile 0 is assumed empty and no pixels are read, saving bandwidth. Can be used for metadata. Option to have tile 0 non-empty? Maybe it should be tile $ffff (or whatever the max index is, given flip bits?) That would never be in the tileset anyway. | ||
For DDR3 tilemaps, it reads an entire row into a buffer. For SRAM, it fetches each tilemap entry as it becomes visible, then fetches the pixels separately, and caches them. | For DDR3 tilemaps, it reads an entire row into a buffer. For SRAM, it fetches each tilemap entry as it becomes visible, then fetches the pixels separately, and caches them. | ||
| Line 125: | Line 127: | ||
bpp (1,2,4,8), size (8x8, 16x16), bitmap mode, which would give it a stride of 256 bytes no matter the bpp (or take a stride byte/word value, more complex computation but just once per rasterline). | bpp (1,2,4,8), size (8x8, 16x16), bitmap mode, which would give it a stride of 256 bytes no matter the bpp (or take a stride byte/word value, more complex computation but just once per rasterline). | ||
1bpp tiles are interesting, in that we could have transparency modes like text: fg only (direct color selection?), fg/bg (palette index where the 2 start from), fg/bgp (bg0 is always transparent, rest are solid). | |||
'''Rougher ideas''' | '''Rougher ideas''' | ||
| Line 139: | Line 143: | ||
16-bit sprite image selection from base pointer, based on bpp & size. Flip bits might be at MSBs of the word. | 16-bit sprite image selection from base pointer, based on bpp & size. Flip bits might be at MSBs of the word. | ||
Color selection, direct for 1bpp, starting palette offset for 2/4bpp. Need to figure out something for 8bpp, | Sprite sheet mode (kinda like square for tilesets), with a declared stride. | ||
Color selection, direct for 1bpp, starting palette offset for 2/4bpp. Note that both of these could be expressed by OR'ing a mask on top of the color. | |||
Need to figure out something for 8bpp color, to have palette swaps for limited ranges. The layer could have a color range/mask where all colors less than that use the sprite color. But this can probably be done with just an OR mask. 8bpp sprites would normally have a 0 'color', palette swaps would leave a range of bits open and use the individual sprite color to set that, but we don't want that done on all bits, only those that are in "palette range", hence the layer setting. Maybe also only when the color is nonzero? or non-FF? | |||
8×8, 16×16, 32×32, 64×64 sizes (2 bit selection, forget 24x24) | 8×8, 16×16, 32×32, 64×64 sizes (2 bit selection, forget 24x24) TG16 had 64x64 and did interesting pseudo-layer and huge enemy stuff with it. | ||
1,2,4,8 bpp (2 bit selection) | 1,2,4,8 bpp (2 bit selection) | ||
| Line 166: | Line 174: | ||
Layer has a height (width is undefined). | Layer has a height (width is undefined). | ||
The base pointer points to an array of <code>height</code>× 16-bit offsets, one for each line, indexed from the base pointer. Each line is an | The base pointer points to an array of <code>height</code>× 16-bit offsets, one for each line, indexed from the base pointer. Each line is an non-demarcated concatenation of tags: | ||
<code>0lllllll cccccccc</code>length + color pair, color 0 is typically transparent | <code>0lllllll cccccccc</code>length + color pair, color 0 is typically transparent | ||
Latest revision as of 16:41, 4 January 2026
Global settings
CRT emulation, only for low resolution layers.
640x480 for 4:3 output, or 960x540 for 16:9 output, if bandwidth can run it.
Non-integer pixel aspect flags, again only for low res layers. Match 320x200 and 256x200 non-square aspects blended on a 480p/540p base output. Keep it at 60Hz
Select 50 or 60 Hz in any resolution. Ditch 70Hz, as nothing syncs to that in the PC space for compatibility. 50Hz is much lower priority, but can be done by extending vblank time and keeping pixel clock the same. If 640x400 res is still used, have it at 60Hz, again same pixel clock but longer vblank.
Global scroll register, for setting where 0,0 is on the display. This might also change how the raster lines are counted, going from say -100 to 380 instead of 0 to 480, to line up with borders and such.
Let the mouse pointer pick a CLUT, instead of being locked to grayscale. Or have its own dedicated 16-color one.
Palettes
Reduce from 24-bit to 16-bit, better suited to 65816, and makes a lot of addressing simpler.
5-5-5-1 masked, or 4-4-4-4 RGBA? Leaning towards the latter. Using transparency 0=opaque, 15=fully transparent is probably easier.
Have a FPGA block which separates & combines 4 values into 1, all R/W registers, avoid all the shifting. Include signed clamping when converting to the single RGB word.
Rougher ideas
Palettes are always 4-4-4-4, but direct color 5-5-5-1 or 4-4-4-4 can be used for bitmap layers? Probably best to keep it the same, but given clear displays today, the 5-5-5 would look better for full-color backgrounds.
HDMA
After a line has been rendered to the line buffer, run the HDMA list. If the HDMA list is done, then trigger the EOL interrupt if enabled. The timing that a line & HDMA takes is dynamic. The EOL should always be fired for a line even if there is no time left.
Ideally, if the NMI line could be connected, the EOL interrupt can be dedicated there for minimizing latency.
At first, the HDMA list should contain a line number to wait for, a count, and number of address/data pairs in video IO space to write, and whether to fire an interrupt.
Advanced features would be to load a value from a table, offset by the raster number. And run the HDMA (with optional interrupt) every line until the target line is reached.
There could be some BRAM dedicated to HDMA use or other on-chip variable storage. Would free up the bus, and take smaller indexing for where to copy from. Also, could set some state vars.
The rasterline would likely be the graphics line, not the hi-res line but this could also be an option.
The HDMA "program counter" is visible and editable, and can be safely modified after EOL interrupt. If it's $000000, then it's disabled? Should HDMA lists be on-chip? Have HDMA variables on-chip, referred to by the copies by small index?
This has the effect of externalizing and obsoleting the rasterline interrupt registers, as the HDMA could just fire them on a list of lines instead, without running any actual DMA.
SNES sets up the destination as part of the HDMA config, then each line only has the data to write. Saves cycles, and is probably reasonable to implement, given what these tend to be used for. Examples at https://snes.nesdev.org/wiki/HDMA_examples
Layers
Every layer def has these:
- Type
- Base pointer (could be page-aligned 16-bit, for 16MB range?, else 32-bit pointer)
- x/y pixel scroll (16-bit wrapping). Could share these in scroll groups, but not a big deal to duplicate these
- CLUT selection
- Bit depth?
- Clip window?
- Last layer flag? Meaning color 0 isn't transparent and uses the CLUT color to fill in the background
- High or low resolution? Some modes and bit depths are low enough bandwidth to do at 480p
64-byte cache per layer, for retaining sprite/tile graphics for reuse (more if bpp is smaller), or a readahead buffer for DDR3 for RLE and bitmap modes, which don't reuse anything anyway
No-overdraw bandwidth reduction
Have multiple hardware instances of layer renderers, all fighting for external bandwidth. Render front-to-back, with transparent pixels causing the next layer underneath to want to draw that pixel. Each layer only requests individual pixels that it needs to draw, and keeps some cache for redrawing the same sprite/tile on the same line. The first layer gets all pixels requested. The only wasted reads are when a read pixel is transparent and dispatches deeper, and 16-bit wide reads where not all the bits are used.
Tilemap data is probably fetched regardless if it's in DDR3. For SRAM, just grabbing & caching the current tile is fine. Tile 0 assumed blank means avoiding fetching any of those pixels.
Alpha Transparency
A pixel entry in the line buffer contains a 12-bit ARGB value, and goes through potentially 3 states: Uninitialized, holding transparent, opaque, with holding transparent being optional.
| Init State | Empty Pixel | Transparent Pixel | Opaque Pixel |
|---|---|---|---|
| Uninit | No-op → Uninit | Set → Transparent | Set → Opaque |
| Transparent | No-op → Transparent | Blend (or no-op) → Transparent | Blend → Opaque |
When transitioning to opaque, the pixel is complete. The common case is uninit to opaque, and shouldn't be done with extra clock cycles, unless the blend is constant and pipelined.
For simplicity, the 'alpha' is actually a transparency channel, with 0 = opaque, and 15 = fully transparent. A layer can override transparency, or any individual palette entry can have non-zero transparency. For most cases, a pixel index value of 0 is a skipped pixel and the equivalent of $F000 (fully transparent black).
Individual layers, sprites, can have a transparency override, ignoring the palette alpha value. If they don't, transparency from the palette per color can still be obeyed. Maybe palette transparency needs to be enabled as well, just using the 12-bit color by default, meaning we can still use 0→f as clear→opaque as standard.
TODO
Buffering alpha sprites with no-overdraw is harder, probably needs its own linebuffer. Both the alpha and opaque layers need to track their own layer depth per pixel when merging together, as maybe only the alpha is visible but not the further back opaque sprite. Simplifications would be that alpha sprites don't show any sprites underneath, only tiles, but that's lame. Another would be that alpha sprites that overlap other sprites always combine with sprites and can't have tiles underneath. That is lame as well. The only real solution is to have the f2b trace through the sprite priority layers instead of having a single combined sprite line buffer, or split out sprite transparency into its own line buffer, with each pixel having its own priority number in both sprite line buffers. Or just support alpha transparency in tile/bitmap/rle layers and not sprites. Again lame, but probably the most workable solution. A small bitmap layer basically acts like a sprite anyway.
Oh, I guess we need an alpha mode as well. Averaging, brightening, dimming, threshold, gel, etc.
Indexed Bit Depth
Currently, everything is 8bpp, which is high bandwidth, more work to create artwork, and doesn't port as easily from other systems that use multiple palette swaps onscreen.
For tiles, sprites, and bitmaps, choose 1/2/4/8 bpp. Direct color 16-bit bitmaps would be separate from paletted bitmaps.
Each layer can select a CLUT. Each sprite or tile points to either a starting palette entry, or maybe a bitmask to OR into it.
Bitmaps
Option to wrap. Else, it shows blank pixels outside its range. Divmod can be done once per line.
Size x/y pixels
Small bitmaps with wrapping could make for easy wipes and parallax effects, covering the entire screen with a pattern.
During render, recompute the starting address to allow raster effects including Y stretch/squash.
1/2/4/8 bpp for indexed bitmaps, option for 4-4-4-4 or 5-5-5-1 for direct color bitmaps.
DDR3 suits bitmaps the best, ensure that gets supported. A 640x480x8bpp (+307,200 $4B000 bytes) single layer mode would also be nice for GUI and PC game ports, especially with blitter support. Scrolling can still be supported, maybe more than 1 layer might be doable as well.
Tiles
Layer: Base pointer = Tilemap pointer, add tilegfx pointer
Map size x/y tiles
Option to wrap.
By default, tile 0 is assumed empty and no pixels are read, saving bandwidth. Can be used for metadata. Option to have tile 0 non-empty? Maybe it should be tile $ffff (or whatever the max index is, given flip bits?) That would never be in the tileset anyway.
For DDR3 tilemaps, it reads an entire row into a buffer. For SRAM, it fetches each tilemap entry as it becomes visible, then fetches the pixels separately, and caches them.
Tilegfx
These are self-describing, with a header that takes up tile 0's pixel space.
bpp (1,2,4,8), size (8x8, 16x16), bitmap mode, which would give it a stride of 256 bytes no matter the bpp (or take a stride byte/word value, more complex computation but just once per rasterline).
1bpp tiles are interesting, in that we could have transparency modes like text: fg only (direct color selection?), fg/bg (palette index where the 2 start from), fg/bgp (bg0 is always transparent, rest are solid).
Rougher ideas
Neo Geo has auto-animating tiles/sprites. 4 or 8 tiles in a row in the tileset can be cycled through for animation. (cycling through low 2 or 3 bits of tile index). Global or layer-specific config for how may frames per step.
Meta-tilemap mode, so the tilemap holds entries that are 2x2 or 4x4 hardware tilemap entries.
Maybe a tilemap could take 2 layers, and a priority bit in the tilemap can elevate some colors within the tilemap to the higher layer, to avoid making multiple layers for things that will always be in front of the sprites? It's a feature in the capcom CPS-1, but might be a little overspecific. Only some places in 2D RPGs can use it, if the item is tall enough that the "behind" area doesn't intersect the "in front" area.
Sprites
H-flip at the very minimum. If V-flip and 90° (since all sprites/tiles are square sized), then all 8 orientations and flips are possible. Rotation is only available in SRAM.
16-bit sprite image selection from base pointer, based on bpp & size. Flip bits might be at MSBs of the word.
Sprite sheet mode (kinda like square for tilesets), with a declared stride.
Color selection, direct for 1bpp, starting palette offset for 2/4bpp. Note that both of these could be expressed by OR'ing a mask on top of the color.
Need to figure out something for 8bpp color, to have palette swaps for limited ranges. The layer could have a color range/mask where all colors less than that use the sprite color. But this can probably be done with just an OR mask. 8bpp sprites would normally have a 0 'color', palette swaps would leave a range of bits open and use the individual sprite color to set that, but we don't want that done on all bits, only those that are in "palette range", hence the layer setting. Maybe also only when the color is nonzero? or non-FF?
8×8, 16×16, 32×32, 64×64 sizes (2 bit selection, forget 24x24) TG16 had 64x64 and did interesting pseudo-layer and huge enemy stuff with it.
1,2,4,8 bpp (2 bit selection)
(or should bpp & size be for the layer? might make for simpler implementation, but varying sprite sizes are probably good. Bpp might still be a consideration for layer config
Overdraw avoidance eliminates pulling in unseen pixels, which prevents hardware collision detection, which is fine.
Rougher ideas
Are all sprites independently defined and layered? Are they all from the same base pointer? Would need multiple layer definitions to give multiple base pointers, and that might be a good idea? Each layer involved could take 64 sprites out of 256, max 4 sprite "layers". Maybe that's not something that should be layer-based, but global to the sprite system. 4 base pointers, 4 groups of 64 sprites, 4 hardware elements scanning the sprite registers in parallel.
Unlimited height sprites? fixed width
8bpp color register could be used to bank a subset of colors, maybe a color range (0-7) can be cycled while others are fixed. Think of Age of Empires 1 recoloring for instance. Or just use that to select a CLUT, overriding the layer's selection.
Select the color to be transparent? If using a fixed smaller palette, like DB16/32, then each sprite could pick a different one. Pico-8 has a 16-bit mask for which colors to include or not, which is interesting.
Cut-out sprites wouldn't display, but would clear any pixel from sprites above it, allowing sprites below to show through. Or, it could skip the sprite immediately below if it has a pixel, masking a single sprite, which might be easier to implement.
Figure out sprite zooming. No rotation,just scaling, not inverting with this? Can grow or shrink independently in x & y. Maybe bresenham? Probably want sub-pixel accuracy, 16-bit with fixed point? Or full 32-bit fixed? x1/x2/y1/y2 dest rectangle maybe?
RLE
Layer has a height (width is undefined).
The base pointer points to an array of height× 16-bit offsets, one for each line, indexed from the base pointer. Each line is an non-demarcated concatenation of tags:
0lllllll cccccccclength + color pair, color 0 is typically transparent
1lllllll c...length + literal pixels.
Uses:
- Wipes
- Solid color borders
- Raster bars without interrupts
- SNES-style explosion or light ray effects (esp with transparency)
- Compressed cel style images or animations
- Polygon fiilling, especially with multiple layers instead of merging spans into 1 layer
- Cheap enough to run in hi-res?