proxy70

mrmuk [01:45] reload the screenshot. now it has a white triangle (which doesn't look white at all due to the darker scanlines) :)
Pezac [01:46] mrmuk fanx mrmuk
mrmuk [01:46] now you can clearly see the ham colorbleeding
Pezac [01:47] haha! the read trash before the white :)
Pezac [01:47] yes.. nice!
Pezac [01:48] you're writing RGBB?
mrmuk [01:49] RGBG
Pezac [01:49] any reason for that?
mrmuk [01:49] yes
Pezac [01:49] would you enlighten me?
mrmuk [01:49] you can check overflow of R using cmp.l, overflow of G using cmp.b, and overflow of B using cmp.w
mrmuk [01:49] (when adding together colors)
mrmuk [01:50] although i really wish i could ditch those compares completely
Pezac [01:50] ouch that was something to think of
Pezac [01:50] I need time :)
Pezac [01:51] but there is some gain in having RGBB .. the last two bytes is written quickly :)
mrmuk [01:51] add.l d0,d1; cmp.l #$40000000,d0 tells you if R got too big, and if it did, you or.l #$3f000000,d0; and.l #$3fffffff.d0
mrmuk [01:52] pezac: what do you mean?
mrmuk [01:52] i always read/write longwords when dealing with ham8
Pezac [01:52] well.. you calculate r,g,b and then you just write the b byte twice
mrmuk [01:53] that's probably slower
mrmuk [01:53] it's important to utilize parallelism here and 32bit r/w
mrmuk [01:53] it's kind of like mmx actually _:)
Pezac [01:53] I'm not.. I once examined the way of "making a longword" before writing but it was not faster
Pezac [01:53] the datacache is working fine :)
mrmuk [01:53] i don't even "make a longword"
mrmuk [01:53] all the source textures etc are in RGBG format
Pezac [01:54] but why bother? my writes goes to dcache
mrmuk [01:54] so a plain old tmapper is just move.l (a0,d0.l*4),(a1)+ (with the additional stuff)
mrmuk [01:54] instead of move.b (a0,d0.l),(a1)+
Pezac [01:55] ofcourse.. but I'm thinking of the case where you must get a color, then calculate something and put pack
Pezac [01:55] where you have 3 operations, one for each color
mrmuk [01:55] yeah well... i'd really like to find a way to lose those cmp's when adding
mrmuk [01:56] and btw you need to do 3 adds when adding two pixels together, right?
mrmuk [01:56] i only need one, plus the clamping code
Pezac [01:56] let's see.. yomat did some trick for adding in the text-stuff for the latest intro
Pezac [01:56] it was no cmp's afaik
mrmuk [01:57] hmm scc could probably be used
mrmuk [01:57] bah, that would still require cmp
Chip^Nat [01:57] or'ing it instead?
mrmuk [01:57] just no branches
Pezac [01:57] afraid of branches?
mrmuk [01:57] of course
Pezac [01:57] I will take a look
mrmuk [01:58] they really should've added something like mmx to 060 ;)
mrmuk [01:58] then we could have an instruction like "clamp all bytes in Dx to [0, 63]"
Pezac [01:58] why are you afraid of branches?
mrmuk [01:59] because they might miss the branch cache and fuck everything up
Pezac [01:59] :) is that so?
Chip^Nat [01:59] stall-o-mania :)
Pezac [02:00] fuck everything up?
mrmuk [02:00] just check the 060 cycle tables for branches
Chip^Nat [02:00] it has to drop everything in the prefetch and start fetching from a new spot
Pezac [02:01] it sounds like a disaster.. the real result is fast ;)
mrmuk [02:02] of course it's not fast
mrmuk [02:02] -not :D
mrmuk [02:02] of course it's fast, but it could be faster!
mrmuk [02:02] let's see now
mrmuk [02:02] not predicted, forward, taken: 7(0/0)
Chip^Nat [02:03] it's like pipelining... it might make the difference between 25 or 50 fps :)
mrmuk [02:03] not predicted, forward, not taken: 1(0/0)
Pezac [02:03] i thought you meant the unpredictability
mrmuk [02:03] predicted correctly as taken: 0(0/0)
mrmuk [02:03] predicted correctly as not taken: 1(0/0)
mrmuk [02:03] predicted incorrectly: 7(0/0)
mrmuk [02:04] so it's 0-7 cycles, and the 7 cycle cases are probably quite common...
Pezac [02:04] yes and will differ from frame to frame..
mrmuk [02:05] scc is always 1(0/0)
mrmuk [02:05] and you'd have three of them per pixel, meaning 3 cycles vs. 0-21 cycles
mrmuk [02:06] but i wonder how to implement the clamping with scc at all
mrmuk [02:07] did scc set all 32 bits of a register or just 8?
Pezac [02:07] 8 i think
mrmuk [02:07] damn
Pezac [02:07] not sure
Chip^Nat [02:07] just 8
mrmuk [02:08] hmm, ext.b extends from b->l?
Chip^Nat [02:08] mrmuk: yep
Pezac [02:08] extb.l
Chip^Nat [02:08] ah, yes
Chip^Nat [02:08] :)
Pezac [02:08] :)
mrmuk [02:08] oops
Chip^Nat [02:08] 020+ :)
Pezac [02:08] yes 020 got b->l :)
Pezac [02:08] go motorola :)
mrmuk [02:09] and that's 1(0/0)
Pezac [02:10] goold old cycle tracking
Pezac [02:10] good
mrmuk [02:11] ok let's see... cmp.l #$40,d0; spl d1; extb.l d1; and.l #$3f000000,d1; or.l d0,d1; and.l #$3fffffff,d0
mrmuk [02:11] damn, that's a lot of instructions
mrmuk [02:11] that would clamp red
Pezac [02:11] i found teh adding code now
mrmuk [02:12] cmp.w #$40,d0; spl d1; ext.w d1; and.w #$3f00,d1; or.w d0,d1; and.w #$3fff,d0
mrmuk [02:12] that would clamp B
Pezac [02:12] is it possible to paste ~10 lines?
mrmuk [02:12] cmp.b #$40,d0; spl d1; extb.l d1; and.l #$3f003f,d1; or.l d0,d1; and.l #$3fff3f,d0
mrmuk [02:12] and that would clamp G
mrmuk [02:13] (assuming RGBG)
mrmuk [02:13] 18 instructions
Pezac [02:13] is it possible to paste ~10 lines?
mrmuk [02:13] paste to /query
Pezac [02:14] oh...
mrmuk [02:14] the bots will probably kick you if you paste them here
Pezac [02:22] or being kicked for excess flood
mrmuk [02:22] hmm. now i should add tmapping
Pezac [02:35] night muk :)
mrmuk [02:45] oh damn
mrmuk [02:45] the uv slope setup probably can't be done with 32 bit fixed point
mrmuk [02:46] 16.16 * 16.16 + 16.16 ...
mrmuk [02:46] well, i could drop 16 bits in total from those multiplicants...
mrmuk [02:52] what's the fixed point format of uv on ps2?
mrmuk [02:54] oh, 12.4 with each texel being 1.0
mrmuk [02:54] excellent, i'll borrow that format since it seems to work well :)
-irc.ircnet. [12:29] jungl|st (~jungl|st@as3-2-4.hs.hs.bonet.se) changed mode: +o Chip-Nat
ppro [12:50] nice how some people assume there is 32 bit zbuffer support... *sigh*
ppro [12:54] hey...
ppro [12:55] my gf2 now lists 24 bit zbuffer formats for 16 bit modes aswell...
ppro [12:55] I thought it could only do 16 bit with 16 bit mode, first...
ppro [12:55] or was I confused with something else...