INDEX
Explanations
references to "people" and their actions
New Auto-Interp
Negative Logits
(es
-0.19
ï¸ı
-0.18
wner
-0.17
Ùij
-0.16
engin
-0.15
οÏĢοίο
-0.15
stadt
-0.15
czy
-0.14
Äįel
-0.14
ptime
-0.14
POSITIVE LOGITS
who
0.47
whom
0.37
who
0.34
/entities
0.31
's
0.31
/groups
0.31
whose
0.29
Who
0.27
اÙĦذÙĬÙĨ
0.27
’s
0.25
Activations Density 0.111%