INDEX
Explanations
tokens related to personal or proper names
proper nouns or names of people
New Auto-Interp
Negative Logits
nered
-0.95
lain
-0.90
rug
-0.89
hof
-0.84
agher
-0.83
lich
-0.80
rums
-0.78
role
-0.78
rarily
-0.78
furt
-0.78
POSITIVE LOGITS
sie
0.77
lectic
0.66
IVER
0.63
payday
0.61
ESSION
0.61
ox
0.59
LECT
0.59
FINEST
0.58
opioids
0.58
Prescott
0.56
Activations Density 0.161%