INDEX
Explanations
names and references to individuals or entities in the text
New Auto-Interp
Negative Logits
zelf
-0.19
ijo
-0.18
UFFIX
-0.16
erm
-0.16
ities
-0.16
INGS
-0.15
Agility
-0.15
ovice
-0.15
bins
-0.15
roud
-0.15
POSITIVE LOGITS
icut
0.19
ion
0.18
atic
0.17
icult
0.16
al
0.16
từng
0.16
оÑģÑĮ
0.16
be
0.16
/ref
0.15
ÙĨاÙħÙĩ
0.15
Activations Density 0.380%