INDEX
Explanations
conditional phrases and questions
New Auto-Interp
Negative Logits
ldr
-0.19
atters
-0.18
ALSE
-0.16
nika
-0.15
}elseif
-0.15
gin
-0.14
ged
-0.14
erno
-0.14
assing
-0.14
ourd
-0.13
POSITIVE LOGITS
merely
0.18
deaux
0.15
izin
0.15
åıªæĺ¯
0.15
SHIP
0.15
yoksa
0.15
just
0.14
ever
0.14
930
0.14
îł
0.14
Activations Density 0.030%