INDEX
Explanations
statements that express observations or insights about various subjects
New Auto-Interp
Negative Logits
avic
-0.15
bla
-0.14
Others
-0.14
elf
-0.14
ÑĨо
-0.14
Ú©ÙĨاÙĨ
-0.13
perch
-0.13
ж
-0.13
ilent
-0.13
ifa
-0.13
POSITIVE LOGITS
constant
0.17
acker
0.17
ìķħ
0.16
tlement
0.15
certainty
0.15
itious
0.15
nota
0.15
strike
0.15
vester
0.15
consistent
0.15
Activations Density 0.075%