WF16 Video Architecture
Global settings
CRT emulation, only for low resolution layers.
640x480 for 4:3 output, or 960x540 for 16:9 output, if bandwidth can run it.
Non-integer pixel aspect flags, again only for low res layers. Match 320x200 and 256x200 non-square aspects blended on a 480p/540p base output. Keep it at 60Hz
Select 50 or 60 Hz in any resolution. Ditch 70Hz, as nothing syncs to that in the PC space for compatibility. 50Hz is much lower priority, but can be done by extending vblank time and keeping pixel clock the same. If 640x400 res is still used, have it at 60Hz, again same pixel clock but longer vblank.
Let the mouse pointer pick a CLUT, instead of being locked to grayscale. Or have its own dedicated 16-color one.
Palettes
Reduce from 24-bit to 16-bit, better suited to 65816, and makes a lot of addressing simpler.
5-5-5-1 masked, or 4-4-4-4 RGBA? Leaning towards the latter. Using transparency 0=opaque, 15=fully transparent is probably easier.
Have a FPGA block which separates & combines 4 values into 1, all R/W registers, avoid all the shifting. Include signed clamping when converting to the single RGB word.
Rougher ideas
Palettes are always 4-4-4-4, but direct color 5-5-5-1 or 4-4-4-4 can be used for bitmap layers? Probably best to keep it the same, but given clear displays today, the 5-5-5 would look better for full-color backgrounds.
Layers
Every layer def has these:
- Type
- Base pointer (could be page-aligned 16-bit, for 16MB range?, else 32-bit pointer)
- x/y pixel scroll (16-bit wrapping). Could share these in scroll groups, but not a big deal to duplicate these
- CLUT selection
- Bit depth?
- Clip window?
- Last layer flag? Meaning color 0 isn't transparent and uses the CLUT color to fill in the background
- High or low resolution? Some modes and bit depths are low enough bandwidth to do at 480p
- CRT emulation? only for low resolution layers. Probably a global setting, not per layer.
- Non-integer pixel aspect flags? to match 320x200 narrow pixels or 256x200 wide pixels. Probably global setting
64-byte cache per layer, for retaining sprite/tile graphics for reuse (more if bpp is smaller), or a readahead buffer for DDR3 for RLE and bitmap modes, which don't reuse anything anyway
No-overdraw bandwidth reduction
Have multiple hardware instances of layer renderers, all fighting for external bandwidth. Render front-to-back, with transparent pixels causing the next layer underneath to want to draw that pixel. Each layer only requests individual pixels that it needs to draw, and keeps some cache for redrawing the same sprite/tile on the same line. The first layer gets all pixels requested. The only wasted reads are when a read pixel is 0 and dispatches deeper, and 16-bit wide reads where not all the bits are used.
Tilemap data is probably fetched regardless from DDR3. For SRAM, just grabbing & caching the current tile is fine.
Alpha Transparency
A pixel entry in the line buffer contains a 12-bit ARGB value, and goes through potentially 3 states: Uninitialized, holding transparent, opaque, with holding transparent being optional.
| Init State | Empty Pixel | Transparent Pixel | Opaque Pixel |
|---|---|---|---|
| Uninit | No-op → Uninit | Set → Transparent | Set → Opaque |
| Transparent | No-op → Transparent | Blend (or no-op) → Transparent | Blend → Opaque |
When transitioning to opaque, the pixel is complete. The common case is uninit to opaque, and shouldn't be done with extra clock cycles, unless the blend is constant and pipelined.
For simplicity, the 'alpha' is actually a transparency channel, with 0 = opaque, and 15 = fully transparent. A layer can override transparency, or any individual palette entry can have non-zero transparency. For most cases, a pixel index value of 0 is a skipped pixel and the equivalent of $F000 (fully transparent black).
Individual layers, sprites, can have a transparency override, ignoring the palette alpha value. If they don't, transparency from the palette per color can still be obeyed. Maybe palette transparency needs to be enabled as well, just using the 12-bit color by default, meaning we can still use 0→f as clear→opaque as standard.
TODO
Buffering alpha sprites with no-overdraw is harder, probably needs its own linebuffer. Both the alpha and opaque layers need to track their own layer depth per pixel when merging together, as maybe only the alpha is visible but not the further back opaque sprite. Simplifications would be that alpha sprites don't show any sprites underneath, only tiles, but that's lame. Another would be that alpha sprites that overlap other sprites always combine with sprites and can't have tiles underneath. That is lame as well. The only real solution is to have the f2b trace through the sprite priority layers instead of having a single combined sprite line buffer, or split out sprite transparency into its own line buffer, with each pixel having its own priority number in both sprite line buffers. Or just support alpha transparency in tile/bitmap/rle layers and not sprites. Again lame, but probably the most workable solution. A small bitmap layer basically acts like a sprite anyway.
Oh, I guess we need an alpha mode as well. Averaging, brightening, dimming, threshold, gel, etc.
Indexed Bit Depth
Currently, everything is 8bpp, which is high bandwidth, more work to create artwork, and doesn't port as easily from other systems that use multiple palette swaps onscreen.
For tiles, sprites, and bitmaps, choose 1/2/4/8 bpp. Direct color 16-bit bitmaps would be separate from paletted bitmaps.
Each layer can select a CLUT. Each sprite or tile points to either a starting palette entry, or maybe a bitmask to OR into it.
Bitmaps
Option to wrap. Else, it shows blank pixels outside its range. Divmod can be done once per line.
Size x/y pixels
Small bitmaps with wrapping could make for easy wipes and parallax effects, covering the entire screen with a pattern.
During render, recompute the starting address to allow raster effects including Y stretch/squash.
1/2/4/8 bpp for indexed bitmaps, option for 4-4-4-4 or 5-5-5-1 for direct color bitmaps.
Tiles
Layer: Tilemap pointer, tilegfx pointer
Map size x/y tiles
Option to wrap.
Option for tile 0 to be empty, without fetching the pixel data. Saves bandwidth.
For DDR3 tilemaps, it reads an entire row into a buffer. For SRAM, it fetches each tilemap entry as it becomes visible, then fetches the pixels separately, and caches them.
Tilegfx
These are self-describing, with a header that takes up tile 0's pixel space.
bpp (1,2,4,8), size (8x8, 16x16), bitmap mode, which would give it a stride of 256 bytes no matter the bpp.
Rougher ideas
Neo Geo has auto-animating tiles/sprites. 4 or 8 tiles in a row in the tileset can be cycled through for animation. (cycling through low 2 or 3 bits of tile index). Global or layer-specific config for how may frames per step.
Meta-tilemap mode, so the tilemap holds entries that are 2x2 or 4x4 hardware tilemap entries.
Maybe a tilemap could take 2 layers, and a priority bit in the tilemap can elevate some colors within the tilemap to the higher layer, to avoid making multiple layers for things that will always be in front of the sprites? It's a feature in the capcom CPS-1, but might be a little overspecific. Only some places in 2D RPGs can use it, if the item is tall enough that the "behind" area doesn't intersect the "in front" area.
Sprites
H-flip at the very minimum. If V-flip and 90° (since all sprites/tiles are square sized), then all 8 orientations and flips are possible. Rotation is only available in SRAM.
16-bit sprite image selection from base pointer, based on bpp & size. Flip bits might be at MSBs of the word.
Color selection, direct for 1bpp, starting palette offset for 2/4bpp. Need to figure out something for 8bpp,
8×8, 16×16, 32×32, 64×64 sizes (2 bit selection, forget 24x24)
1,2,4,8 bpp (2 bit selection)
(or should bpp & size be for the layer? might make for simpler implementation, but varying sprite sizes are probably good. Bpp might still be a consideration for layer config
Overdraw avoidance eliminates pulling in unseen pixels, which prevents hardware collision detection, which is fine.
Rougher ideas
Are all sprites independently defined and layered? Are they all from the same base pointer? Would need multiple layer definitions to give multiple base pointers, and that might be a good idea? Each layer involved could take 64 sprites out of 256, max 4 sprite "layers". Maybe that's not something that should be layer-based, but global to the sprite system. 4 base pointers, 4 groups of 64 sprites, 4 hardware elements scanning the sprite registers in parallel.
Unlimited height sprites? fixed width
8bpp color register could be used to bank a subset of colors, maybe a color range (0-7) can be cycled while others are fixed. Think of Age of Empires 1 recoloring for instance. Or just use that to select a CLUT, overriding the layer's selection.
Select the color to be transparent? If using a fixed smaller palette, like DB16/32, then each sprite could pick a different one. Pico-8 has a 16-bit mask for which colors to include or not, which is interesting.
Cut-out sprites wouldn't display, but would clear any pixel from sprites above it, allowing sprites below to show through. Or, it could skip the sprite immediately below if it has a pixel, masking a single sprite, which might be easier to implement.
Figure out sprite zooming. No rotation,just scaling, not inverting with this? Can grow or shrink independently in x & y. Maybe bresenham? Probably want sub-pixel accuracy, 16-bit with fixed point? Or full 32-bit fixed? x1/x2/y1/y2 dest rectangle maybe?
RLE
Layer has a height (width is undefined).
The base pointer points to an array of height× 16-bit offsets, one for each line, indexed from the base pointer. Each line is an undemarcated concatenation of tags:
0lllllll cccccccclength + color pair, color 0 is typically transparent
1lllllll c...length + literal pixels.
Uses:
- Wipes
- Solid color borders
- Raster bars without interrupts
- SNES-style explosion or light ray effects (esp with transparency)
- Compressed cel style images or animations
- Polygon fiilling, especially with multiple layers instead of merging spans into 1 layer
- Cheap enough to run in hi-res?