-->

Archive for March, 2009

Optimising on the Casio CFX-9850C

Monday, March 2nd, 2009

Hello everyone,

I reckon it’s time for an update.

The Casio CFX-9850C is not exactly known for its RAW POWAH. The front has a text proudly proclaiming it’s a COLOR POWER GRAPHIC 64 KB device, and it indeed does have three colors (orange, blue and green combine into black) and a whopping 64 KB of RAM memory. And even though I’ve had it for three years, I’ve only managed to use up 20,127 bytes in the Program section (22,529 bytes free) without deleting much programs. 64 KB is a lot of plain text.

As you may have guessed, I enjoy making programs for this little calculator. Even though it sucks donkey balls.

Features programming-wise:

  • 128 x 64 pixel display giving you 21 characters in 7 rows (the 8th row is reserved for system menu’s)
  • 26 letter variables (A-Z) to store values in
  • 6 lists (one dimensional arrays) of undetermined length
  • 26 matrices (two dimensional arrays) of undetermined size

Not so cool features:

  • The luxurious Program menu gives you three choices: either you figure out the way you want your programs to be organized before you actually start typing, you retype one program into the other and in reverse and rename the files or you use the search function.
  • Functions are quite a few characters which means you will end up with lots and lots of rows of code very quickly.
  • Functions are ridiculously hidden: Shift -> F4 -> F6 -> F2 -> F2 for F-Line for instance.
  • There’s no easy way to backtrack in your code, so you’ll be using the down button. A lot.
  • The data cable can’t be bought anywhere apparently so back up your code by hand.

And yet, I love it.

In short, it makes me feel like a hacker on an 80’s terminal. Programming on the 9850C plus has taught me a lot about optimizing. Some of my findings:

  • If you don’t need to redraw the entire scene, don’t! Only redraw the portions you’re actually using.
  • Keep your code organized, or you’ll lose readability.
  • Reading a variable is much lighter than calculating the same value.

Now that last one is a biggy. To demonstrate this (and what this blog is about), I will show you one of the programs I’m most proud of: COS and its cousins.

COS in its craptastic phone quality glory!

COS in its craptastic phone quality glory!

COS started out as something to keep me busy during boring math classes. I already knew about cosine and sine and I wanted to show off my mad skillzzz with a rotating 3D cube. The problem with this implementation is that every step (every rotation of the cube) every point (x and y) is being calculated using sine and cosine for every line being drawn. The rotation is done with 5 degree intervals. The code comes in at 579 bytes. Pretty reasonable.

Here’s what some of the code looks like:

F-Line 16+Intg (cos (
A)xW),16+Intg (sin (A
)xH),16+Intg (cos (A)
xW),32+Intg (sin (A)x
H)<|
A+90->B<

Do that times eight and you’ll realize why this is so slow.

I timed it with my phone just now. It takes 36 seconds to render 10 frames which means it has a seconds per frame (spf) of 3.6. If you don’t know what that means, games are usually run at 60 frames per second (fps). It’s kinda slow.

And I knew I could do better.

So the next version, COS3 (there is no COS2, I must’ve deleted it) calculates the x and y of each point, puts it in a letter variable and puts that in the F-Line functions.

COS3 is twice as big and therefor automatically better.

COS3 is twice as big and therefor automatically better.

Sample code:

X+Int (cos (Z+270)xW)
->G:Y+Int (sin (Z+270)
xI)->H<|
F-Line A,B,C,D<|
F-Line C,D,E,F<|
F-Line E,F,G,H<|
F-Line G,H,A,B<|
F-Line A,B+J,C,D+J<|

This makes it much faster; twice as fast in fact. It takes 18 seconds to do 10 frames, which gives 1.8 spf. Still, I wasn’t pleased with it. I wanted it do at least 1 frame per second.

I was still optimistic, and that’s why a lot of values that could’ve been a constant (x and y of the cube, width and height, rotation speed) were left as variables.

But optimisation also means stripping out unnecessary functionality, and that’s what I did with COS4.

Displaying the text actually takes about one third as long as calculating the values.

Displaying the text actually takes about one third as long as calculating the values.

COS4 is one of the newer implementations that I started working after actually learning about optimisation in school. One of the things I was taught is to use look up tables. So what I did was put the values into a list and read them out when needed.

1->Z<|
While Z<37<|
64+Int (cos((Z-1)x10
)x32+0.5)->List 1[Z]<|
Int (sin ((Z-1)x10)x1
6+0.5)->List 2[Z]<|
Z+1->Z<|
WhileEnd<|

The rest of the code was pretty much the same, except I read the values from a list instead of the letter variables. This didn’t result in the dramatic as it did before though; this code renders frames at 1.5 spf and it has a start up time of about 5 seconds. It also dramatically increased the filesize to 687 bytes.

Optimizing at this point didn’t really seem to have much point.

So I tried anyway.

As it turns out, the A-Z variables work a bit like a the N1 cache in a normal computer: the memory is closer to the processor so it can read values a bit quicker than from RAM (the lists and matrices). The N1 cache is extremely limited, but faster than the big RAM.

Translating that bit of computing knowledge to the calculator: putting the values into letter variables before putting them into F-Line actually speeds up things a bit!

Here’s the complete code of COS5 for your viewing pleasure:

"MAKE TABLE..."<|
Deg<|
0->Z:1->Y<|
While Z<360<|
64+Int (cos (Z)x32)->L
ist 1[Y]<|
Int (sin (Z)x16)->List
 2[Y]<|
Z+10->Z:Y+1->Y<|
WhileEnd<|
ClrGraph<|
CoordOff:GridOff:Axes
Off:LabelOff<|
1->W:10->X:19->Y:28->Z<|
ViewWindow 0,127,1,63
,0,1<|
StoV-Win 1<|
Lbl 1<|
RclV-Win 1<|
'<|
List 1[W]->A<|
List 1[X]->B<|
List 1[Y]->C<|
List 1[Z]->D<|
'<|
List 2[W]+16->E<|
List 2[X]+16->F<|
List 2[Y]+16->G<|
List 2[Z]+16->H<|
'<|
List 2[W]+47->I<|
List 2[X]+47->J<|
List 2[Y]+47->K<|
List 2[Z]+47->L<|
'<|
F-Line A,E,B,F<|
F-Line B,F,C,G<|
F-Line C,G,D,H<|
F-Line D,H,A,B<|
'<|
F-Line A,I,B,J<|
F-Line B,J,C,K<|
F-Line C,K,D,L<|
F-Line D,L,A,I<|
'<|
F-Line A,E,A,I<|
F-Line B,F,B,J<|
F-Line C,G,C,K<|
F-Line D,H,D,L<|
'<|
W+1->W<|
X+1->X<|
Y+1->Y<|
Z+1->Z<|
W>36=>1->W<|
X>36=>1->X<|
Y>36=>1->Y<|
Z>36=>1->Z<|
Goto 1

The biggest bottleneck is F-Line. You can actually see every line being drawn. Unfortunately, this is as deep as I can go (it’s not possible to go to assembler) and I can’t really do anything about it.

Final thoughts:

Video of COS5 running (crap quality):

Code deconstruction:

  • COS - 579 bytes - 3.6 spf
  • COS3 - 374 bytes - 1.8 spf
  • COS4 - 687 bytes - 1.5 spf
  • COS5 - 495 bytes - 1.4 spf

As you can see, the actual results (the spf) follows a very nice downward graph. The more time I put into it, the less result I got from it.

The first optimization resulted in a 100% speed boost, while the second gave 20% and the third a measly 7%.

But what that means for game developers is two things: either you can run the same rendering at a higher speed or you can put in more renderings.

If this were  a 3D engine it would mean that I could either put 2 cubes in the game and have it run at the same speed as 1 cube used to or that the 1 cube now runs twice as fast.

But secretly, we also do it because it’s awesome.

It’s like looking at a maze from above. First you find a route to the exit, then you find the most efficient route, then you start knocking down some unnecessary walls until you end up with a maze that looks nothing like it used to, yet still does the same thing.

It’s the ultimate programmer challenge.

-knighty