INDEX
Explanations
proper nouns and their associated attributes, particularly names related to places and individuals
New Auto-Interp
Negative Logits
ourn
-0.16
æŀľ
-0.15
Sheriff
-0.14
progress
-0.14
encer
-0.14
-0.14
enco
-0.14
Bruno
-0.13
мÑĥ
-0.13
resp
-0.13
POSITIVE LOGITS
shaw
0.19
OVE
0.18
olulu
0.16
ingham
0.15
еÑĢо
0.15
elow
0.14
ÑĤаж
0.14
ove
0.14
imas
0.14
felt
0.14
Activations Density 0.083%