INDEX
Explanations
references to individuals or groups of people
New Auto-Interp
Negative Logits
(es
-0.22
itself
-0.17
wner
-0.16
ï¸ı
-0.15
人çī©
-0.15
berg
-0.15
undi
-0.15
ìľ¨
-0.15
stadt
-0.15
ayne
-0.15
POSITIVE LOGITS
who
0.30
/entities
0.24
who
0.23
whom
0.23
/groups
0.20
Who
0.20
اÙĦذÙĬÙĨ
0.20
士
0.19
hood
0.19
age
0.19
Activations Density 0.113%