EXAPUNKS

EXAPUNKS

Not enough ratings
Hardcore optimisations for beating the top percentile
By Eil Knight
Curious how people acheive such results in histagrams? Spoiler-free guide about some of those techniques.
2
3
   
Award
Favorite
Favorited
Unfavorite
Intro
Game offers 3 metrics:
  1. Cycles count. The most hardcore metric, sometimes requires a lot of creativity and imaginations to optimise this. Meaning: amount of ticks of time from the beginning unlti all the task conditions are fulfilled. Each cycle all your EXAs usually execute 1 line of code, however some operations can be blocked for a while and consume more time. Also writing to M register takes 2 ticks and MARK pesudo-instruction doesn't consume a cycle. If you have 10 EXAs working in parralel, they can do 10 instructions per one cycle. When every condition is fulfilled (usually it ends by death of the last EXA) game counts spent cycles. Also game runs 100 tests and puts the bigges cycles count to the metrics.
    Conditions can be:
    • create a file with the specific content
    • delete the specific file
    • modify or move files
    • leave no trace, which means all EXAs are killed
    • transmit signal sequence to hardware register
  2. Size. Amount of lines of code including MARK. @REP N macro command is not counted itself but N generated code blocks will be counted to the metrics.
  3. Activity. The easiest metrics for optimisation. Just counts amount of LINK calls.
Activity
Relativity easy to reach the top percentile here. REPL is your friend.
Imagine 3 consequent nodes: 801, 802, 803, one EXA works in 802, another in 803.
Trivial approach:
  • EXA A
    LINK 800 LINK 801 NOTE DO SOME WORK
  • EXA B
    LINK 800 LINK 801 LINK 802 NOTE DO SOME WORK
We have 5 LINK calls which means activity = 5

Just replace it into:
LINK 800 LINK 801 REPL SECOND NOTE DO SOME EXA A WORK HALT MARK SECOND NOTE DO SOME EXA B WORK
Now second EXA is created in node 801 instead of your host and it doesn't need to do extra 2 movements. Now activity = 3.
Size
Size is more interesting metric in terms of optimisation. Sometimes it correlates with cycles count as well but not always.

By the way, blank lines are not counted, use it for readability!

Let's discuss some techiques.

Combine arithmetics operations with reading/writing

Let's assume that your code requires to modify initial value
COPY F T MULI T 2 T COPY T M

can be refactored into
MULI F 2 M

This also decreases cycles count

Don't use DROP and HALT everywhere
When there are no more lines of code, EXA drops file and halts automatically.
Example:
LINK 800 GRAB 200 LINK 800 DROP HALT
from zine equals to
LINK 800 GRAB 200 LINK 800

Use DROP only when you need to move EXA somewhere else without this file or when you need to grab second file
GRAB 300 COPY F X DROP LINK 800

Use HALT only when you have condition branches or REPLicated exas and second branch can be executed without exception (exceptions handling described in next paragraph).
Example: get variable from radio and put positive modulus to file:
COPY M X TEST X > 0 FJMP NEGATIVE COPY X F HALT MARK NEGATIVE MULI X -1 F
Here without HALT both branches are going to be executed

Exceptions are your friends!

When there is an runtime error:
  • no such link/file/hardware
  • grab second file without dropping the first
  • reading from end of file
  • division by zero
EXA is killed.
Let's use it.

Reading from file:
GRAB 200 MARK LOOP COPY F M TEST EOF FJMP LOOP
Trying to read file at EOF causes exception and EXA halts:
GRAB 200 MARK LOOP COPY F M JUMP LOOP
New code snipppet does the same. Also it is greatly decreases cycle count.

Grabbing file without dropping it:
REPL A GRAB 200 MARK A GRAB 201
If first EXA doesn't drop file 200, it will do the job then reach mark A and crash due to access exception

No file/link/hardware
REPL A LINK 801 NOTE SOME WORK MARK A GRAB 200
or
NOTE SOME CONDITION TJMP A LINK 801 NOTE SOME WORK MARK A LINK 802 NOTE SOME WORK
In those snippets it is unlikely that node 801 has link to 802 or file 200 so EXA is going to crash after reaching mark A.
However swapping nodes might require HALT:
REPL A NOTE SOME WORK HALT MARK A LINK 801 NOTE SOME WORK
After reaching mark A, EXA might be able to go to 801 and mess things up. So put some attention which part of code to use in condition branch or REPL.

Replace conditions to arithmetical clamps
When value exceeds 9999 or -9999 it is clamped. We can use it
MARK L1 COPY #NERV X TEST X > 50 TJMP H TEST X < -120 TJMP L COPY X M JUMP L1 MARK L COPY -120 M JUMP L1 MARK H COPY 50 M JUMP L1
can be effectively replaced to
ADDI 9949 #NERV X SUBI X 9949 X SUBI X 9879 X ADDI X 9879 M
which greatly decreases size and cycles count.

Save movements with REPL
E.g. node structure is like 1=>1=>1=>1
So 2 EXAs
LINK 1 LINK 1 LINK 1 NOTE WORK 1
LINK 1 LINK 1 LINK 1 LINK 1 NOTE WORK 2
can be replaced to:
LINK 1 LINK 1 LINK 1 REPL A LINK 1 NOTE WORK 2 MARK A NOTE WORK 1
saving 1 line of code with proper exception trick

Try to use raw T value instead of extra test instruction

Let's assume we need to break when get 0
COPY M X TEST X = 0 TJMP END NOTE SOME WORK HALT MARK END HALT
Equals to this snippet
COPY M T FJMP END NOTE SOME WORK MARK END
Reason: when T is 0, FJMP works. When T is something else (include negative values like -1) TJMP works. TEST instruction overrides T to 0 or 1.

Note: HALT is removed because if mark is on the last line, HALT will be automatic. Also for non 0 value code will reach END mark but do nothing, so HALT can be omitted.

This also reduces cycles count, especially in loops.
Cycles count
The most interesting part. Main goal here is decreasing operations in loops which saves a lot of ticks.
However there are also some tricks to save couple of cycles here and there. First of all read previous chapter carefully because LoC Size optimisations often help to decrease cycle count as well.

Let's discuss cycles optimisation now.

Use @REP macros when possible
It works only when you know exactly how many cycles are required.

Send5 entries from file
GRAB 300 @REP 5 COPY F M @END

Write 0,1,2..10
MAKE @REP 10 COPY @{0, 1} F @END

Travel 800=>801=>802=>803=>804
@REP 5 LINK @{800, 1} @END

Write 10, 8, 6 , 4, 2, 0
MAKE @REP 5 COPY @{10, -2} F @END

However when you start your program, @REP N will be replaced to N x block size lines of code.
Also you can't use nested @REP.
Each program has LoC Limit. If @REP adds too much code, your solution can't be compared with leaderboard.
This solution beats top percentile in terms of cycles count:
But it can't be counted because after launching there are 110 LoCs, but allowed size is 50 LoC.
However you have to replace smaller cycles which have known limits to @REP for improving cycle count metrics.

Use T register value instead of TEST when possible

Write to file N,N-1..1,0 (N from another EXA):
ADDI M 1 T MARK LOOP SUBI T 1 T COPY T F TJMP LOOP
To be honest, it is not the most optimal solution, can be optimised for 1 more cycle :D
COPY M T MARK LOOP COPY T F SUBI T 1 T TJMP LOOP COPY 0 F
So this loop consumes 3 cycles per item. Loop with TEST would consume 4 cycles.

Reverse cycles to keep using T instead of TEST

Write to file 0, 1 .. N (N from another EXA):
MAKE ADDI M 1 T SUBI T 1 X MARK LOOP SUBI T 1 T SUBI X T F TJMP LOOP
Still 3 cycles per item.

Use your EXA as procedure (subroutine)!
Assume you have some independent tasks for your EXA which are using same X or T initial data
NOTE LONG WORK 1 NOTE LONG WORK 1
can be paralleled into
REPL A NOTE LONG WORK 1 HALT MARK A NOTE LONG WORK 1
2 EXAs will do tasks much faster.

Process any loop item in 3 cycles

Now it is a time for heavy optimisations.
Let's return to our loops but change task from writing to file to transmitting data through M:
ADDI M 1 F MARK LOOP SUBI T 1 T COPY T M NOTE MORE WORK TJMP LOOP
Copying to M takes 2 cycles. That's why loop item is processed for 4 cycles in total. Adding more instructions increases cycle count heavily, making programs really slow.
Solution is to split cycle from actual work thus executing jobs in parallel mode.
COPY M F MARK LOOP REPL WORK SUBI T 1 T TJMP LOOP MARK WORK COPY T M NOTE MORE WORK
Here we are using REPLicated EXA as procedure (subroutine). Loop entry consumes 3 cycles and only handles looping, all calculations are performed in parallel by another EXA.
This method is easy to control and handle. But you migh need to eventually move clones to another node in case when work takes 3+N (N is free cell count in the room) cycles or more because producer (loop) becomes faster than consumer (current node) and current node can be filled with EXAs causing loop to wait when they are move away or halt themselves.

Process any loop item in 2 cycles!
Okay 3 cycles for loop is good but I need to process data FASTER! I did previous trick but histagrams show much better results. Can you help me??
Yes, there are 2 ways of making 2 cycle loops: stateful and stateless.
Stateful:
MARK LOOP ADDI X 1 X REPL LOOP NOTE REST OF WORK
Stateless:
MARK LOOP REPL WORK JUMP LOOP MARK WORK NOTE REST OF WORK
First one is more useful but harder to handle. Second one is mostly useless but a bit easier to kill.
Both methods require another EXA with killer role. Killer must count how many cycles should this producer live and than kill it. Typical killer code:
COPY 26 T MARK L NOOP SUBI T 1 T TJMP L NOOP KILL KILL
It waits until unmanaged loop does all its job and then kills replicating EXA and it's last child.
Also EXAs have to move away from respawn point more actively in case when work takes 2+N (N is free cell count in the room) cycles or more otherwise process will slow down due to EXAs overflow.

Process any loop item in just 1 cycle!
I wanna be a SPEED DEMON like top percentile guys in chart! Is it possible to run things in real time?
Actually, yes:
MARK WORK REPL WORK NOTE REST OF WORK
EXA firstly creates new stateless copy and then thinks what to do next.
Using this approach you might need more than one killer. Also your EXAs should constantly move forward: Each operation within the node (including LINK to that node) occupies one free cell so if room has 4 free cells you shouldn't do more than 3 ops within it.
Example:
You might notice that this is not top percentile. So here you are! However difference is barely visible but this extra one cycle costs one extra EXA.

The most annoying optimisation

Previous approach can't work in some cases e.g. when you need to write to file. However it is possible to reduce loop execution time and make it close to 1 cycle per entry.
Solution is to split big (e.g 100+ loop) into tens and ones. Then process tens using @REP and ones using normal loop.

DIVI X 10 T FJMP REST MARK T @REP 10 COPY 75 F @END SUBI T 1 T TJMP T MARK REST MODI X 10 T FJMP END MARK L3 SUBI T 1 T COPY 75 F TJMP L3 MARK END
Note: handlers of tens and ones can be swapped

The problem here is that effectivenes of this approach depends on test data so you may need to change 10 in all 3 places to another value and empirically find out the best constant for this particular tests set.
2 Comments
NEET_™ 10 Oct @ 3:29am 
amazing
Zbra 22 Jul, 2023 @ 9:31pm 
I'm shocked nobody commented your incredible work!

Thanks a lot for all these explanations! I was wondering how to further optimize my stuff and some of the techniques you showed really opened my eyes!!
The shear amount of techniques you posted is staggering!

Thanks a lot for your time writing this :p2cube: