INDEX
Explanations
statements of identity and self-description
New Auto-Interp
Negative Logits
ok
-0.17
759
-0.15
åIJ§
-0.15
böylece
-0.15
therefore
-0.15
812
-0.15
.Dao
-0.14
however
-0.14
acha
-0.14
Either
-0.14
POSITIVE LOGITS
only
0.20
åıªæĺ¯
0.18
only
0.17
далеко
0.17
Only
0.16
deaux
0.15
âīł
0.15
Only
0.15
/Peak
0.15
’ÑıÑĤ
0.15
Activations Density 0.234%