INDEX
Explanations
alternative approaches to problem-solving
New Auto-Interp
Negative Logits
usto
-0.17
ovna
-0.15
POSIT
-0.15
trunk
-0.15
/Area
-0.14
#
-0.14
íĨłíĨł
-0.14
cia
-0.14
EO
-0.14
ei
-0.14
POSITIVE LOGITS
justice
0.28
Justice
0.24
justice
0.23
Justice
0.21
wrong
0.21
differently
0.18
-done
0.18
backwards
0.17
proud
0.16
estilo
0.16
Activations Density 0.075%