INDEX
Explanations
locations or proper nouns related to specific places
references to specific names and places
New Auto-Interp
Negative Logits
onut
-0.88
icol
-0.80
umn
-0.77
µ
-0.74
©¶æ
-0.69
esity
-0.69
lasses
-0.66
icho
-0.64
çīĪ
-0.63
sam
-0.63
POSITIVE LOGITS
Lann
0.94
ion
0.90
enger
0.87
afort
0.84
inian
0.76
igate
0.71
lain
0.70
yards
0.68
ribut
0.66
Malfoy
0.66
Activations Density 0.046%