mrmuk [01:45] reload the screenshot. now it has a white triangle (which doesn't look white at all due to the darker scanlines) :) Pezac [01:46] mrmuk fanx mrmuk mrmuk [01:46] now you can clearly see the ham colorbleeding Pezac [01:47] haha! the read trash before the white :) Pezac [01:47] yes.. nice! Pezac [01:48] you're writing RGBB? mrmuk [01:49] RGBG Pezac [01:49] any reason for that? mrmuk [01:49] yes Pezac [01:49] would you enlighten me? mrmuk [01:49] you can check overflow of R using cmp.l, overflow of G using cmp.b, and overflow of B using cmp.w mrmuk [01:49] (when adding together colors) mrmuk [01:50] although i really wish i could ditch those compares completely Pezac [01:50] ouch that was something to think of Pezac [01:50] I need time :) Pezac [01:51] but there is some gain in having RGBB .. the last two bytes is written quickly :) mrmuk [01:51] add.l d0,d1; cmp.l #$40000000,d0 tells you if R got too big, and if it did, you or.l #$3f000000,d0; and.l #$3fffffff.d0 mrmuk [01:52] pezac: what do you mean? mrmuk [01:52] i always read/write longwords when dealing with ham8 Pezac [01:52] well.. you calculate r,g,b and then you just write the b byte twice mrmuk [01:53] that's probably slower mrmuk [01:53] it's important to utilize parallelism here and 32bit r/w mrmuk [01:53] it's kind of like mmx actually _:) Pezac [01:53] I'm not.. I once examined the way of "making a longword" before writing but it was not faster Pezac [01:53] the datacache is working fine :) mrmuk [01:53] i don't even "make a longword" mrmuk [01:53] all the source textures etc are in RGBG format Pezac [01:54] but why bother? my writes goes to dcache mrmuk [01:54] so a plain old tmapper is just move.l (a0,d0.l*4),(a1)+ (with the additional stuff) mrmuk [01:54] instead of move.b (a0,d0.l),(a1)+ Pezac [01:55] ofcourse.. but I'm thinking of the case where you must get a color, then calculate something and put pack Pezac [01:55] where you have 3 operations, one for each color mrmuk [01:55] yeah well... i'd really like to find a way to lose those cmp's when adding mrmuk [01:56] and btw you need to do 3 adds when adding two pixels together, right? mrmuk [01:56] i only need one, plus the clamping code Pezac [01:56] let's see.. yomat did some trick for adding in the text-stuff for the latest intro Pezac [01:56] it was no cmp's afaik mrmuk [01:57] hmm scc could probably be used mrmuk [01:57] bah, that would still require cmp Chip^Nat [01:57] or'ing it instead? mrmuk [01:57] just no branches Pezac [01:57] afraid of branches? mrmuk [01:57] of course Pezac [01:57] I will take a look mrmuk [01:58] they really should've added something like mmx to 060 ;) mrmuk [01:58] then we could have an instruction like "clamp all bytes in Dx to [0, 63]" Pezac [01:58] why are you afraid of branches? mrmuk [01:59] because they might miss the branch cache and fuck everything up Pezac [01:59] :) is that so? Chip^Nat [01:59] stall-o-mania :) Pezac [02:00] fuck everything up? mrmuk [02:00] just check the 060 cycle tables for branches Chip^Nat [02:00] it has to drop everything in the prefetch and start fetching from a new spot Pezac [02:01] it sounds like a disaster.. the real result is fast ;) mrmuk [02:02] of course it's not fast mrmuk [02:02] -not :D mrmuk [02:02] of course it's fast, but it could be faster! mrmuk [02:02] let's see now mrmuk [02:02] not predicted, forward, taken: 7(0/0) Chip^Nat [02:03] it's like pipelining... it might make the difference between 25 or 50 fps :) mrmuk [02:03] not predicted, forward, not taken: 1(0/0) Pezac [02:03] i thought you meant the unpredictability mrmuk [02:03] predicted correctly as taken: 0(0/0) mrmuk [02:03] predicted correctly as not taken: 1(0/0) mrmuk [02:03] predicted incorrectly: 7(0/0) mrmuk [02:04] so it's 0-7 cycles, and the 7 cycle cases are probably quite common... Pezac [02:04] yes and will differ from frame to frame.. mrmuk [02:05] scc is always 1(0/0) mrmuk [02:05] and you'd have three of them per pixel, meaning 3 cycles vs. 0-21 cycles mrmuk [02:06] but i wonder how to implement the clamping with scc at all mrmuk [02:07] did scc set all 32 bits of a register or just 8? Pezac [02:07] 8 i think mrmuk [02:07] damn Pezac [02:07] not sure Chip^Nat [02:07] just 8 mrmuk [02:08] hmm, ext.b extends from b->l? Chip^Nat [02:08] mrmuk: yep Pezac [02:08] extb.l Chip^Nat [02:08] ah, yes Chip^Nat [02:08] :) Pezac [02:08] :) mrmuk [02:08] oops Chip^Nat [02:08] 020+ :) Pezac [02:08] yes 020 got b->l :) Pezac [02:08] go motorola :) mrmuk [02:09] and that's 1(0/0) Pezac [02:10] goold old cycle tracking Pezac [02:10] good mrmuk [02:11] ok let's see... cmp.l #$40,d0; spl d1; extb.l d1; and.l #$3f000000,d1; or.l d0,d1; and.l #$3fffffff,d0 mrmuk [02:11] damn, that's a lot of instructions mrmuk [02:11] that would clamp red Pezac [02:11] i found teh adding code now mrmuk [02:12] cmp.w #$40,d0; spl d1; ext.w d1; and.w #$3f00,d1; or.w d0,d1; and.w #$3fff,d0 mrmuk [02:12] that would clamp B Pezac [02:12] is it possible to paste ~10 lines? mrmuk [02:12] cmp.b #$40,d0; spl d1; extb.l d1; and.l #$3f003f,d1; or.l d0,d1; and.l #$3fff3f,d0 mrmuk [02:12] and that would clamp G mrmuk [02:13] (assuming RGBG) mrmuk [02:13] 18 instructions Pezac [02:13] is it possible to paste ~10 lines? mrmuk [02:13] paste to /query Pezac [02:14] oh... mrmuk [02:14] the bots will probably kick you if you paste them here Pezac [02:22] or being kicked for excess flood mrmuk [02:22] hmm. now i should add tmapping Pezac [02:35] night muk :) mrmuk [02:45] oh damn mrmuk [02:45] the uv slope setup probably can't be done with 32 bit fixed point mrmuk [02:46] 16.16 * 16.16 + 16.16 ... mrmuk [02:46] well, i could drop 16 bits in total from those multiplicants... mrmuk [02:52] what's the fixed point format of uv on ps2? mrmuk [02:54] oh, 12.4 with each texel being 1.0 mrmuk [02:54] excellent, i'll borrow that format since it seems to work well :) -irc.ircnet. [12:29] jungl|st (~jungl|st@as3-2-4.hs.hs.bonet.se) changed mode: +o Chip-Nat ppro [12:50] nice how some people assume there is 32 bit zbuffer support... *sigh* ppro [12:54] hey... ppro [12:55] my gf2 now lists 24 bit zbuffer formats for 16 bit modes aswell... ppro [12:55] I thought it could only do 16 bit with 16 bit mode, first... ppro [12:55] or was I confused with something else...