INDEX
Explanations
category labels or classifications within the text
New Auto-Interp
Negative Logits
šť
-0.16
гÑĥ
-0.16
iki
-0.15
aida
-0.14
indi
-0.14
_GU
-0.14
lingen
-0.14
ittings
-0.14
bow
-0.14
Winds
-0.13
POSITIVE LOGITS
é¾
0.17
Clarkson
0.15
fat
0.15
iton
0.15
ën
0.14
Ta
0.14
thood
0.14
Per
0.14
tu
0.14
FAT
0.14
Activations Density 0.005%