INDEX
Explanations
the proper nouns, particularly names and locations
New Auto-Interp
Negative Logits
tual
-0.18
upal
-0.17
Ñĥ
-0.17
ĶåĽŀ
-0.16
bower
-0.16
ña
-0.16
erton
-0.15
es
-0.15
LOW
-0.15
uro
-0.15
POSITIVE LOGITS
ymous
0.24
avirus
0.21
imbus
0.20
ically
0.20
ettes
0.19
che
0.18
ise
0.17
script
0.17
ics
0.17
ENTIAL
0.17
Activations Density 0.158%