INDEX
Explanations
references to specific populations or groups in societal contexts
followed by prepositions or "the"
actions and dependencies
New Auto-Interp
Negative Logits
is
-0.36
some
-0.35
itself
-0.35
auszu
-0.34
is
-0.34
einiges
-0.33
if
-0.33
again
-0.33
head
-0.32
case
-0.32
POSITIVE LOGITS
którzy
0.86
kteří
0.75
queſta
0.74
Personendaten
0.73
ktorí
0.73
Controllo
0.72
Normdatei
0.71
marinho
0.67
незавершена
0.65
OGND
0.65
Activations Density 0.678%