INDEX
Explanations
symbols and unusual characters in the text
New Auto-Interp
Negative Logits
agate
-0.17
irma
-0.16
.Restrict
-0.15
_atomic
-0.14
_ATOM
-0.14
SAT
-0.14
izard
-0.14
spender
-0.14
orning
-0.14
менÑĪ
-0.14
POSITIVE LOGITS
hit
0.18
Fair
0.17
heat
0.16
therm
0.16
Kurul
0.16
Mac
0.15
Ruf
0.15
ither
0.15
fair
0.15
Mitchell
0.15
Activations Density 0.005%