INDEX
Explanations
proper nouns related to political figures and locations
New Auto-Interp
Negative Logits
itiveness
-0.62
rontal
-0.61
Dealer
-0.59
rocal
-0.59
prem
-0.58
Freak
-0.58
akura
-0.56
elig
-0.56
hereafter
-0.55
Filename
-0.55
POSITIVE LOGITS
steen
0.94
phia
0.87
ande
0.75
ith
0.75
ando
0.70
endi
0.70
jah
0.67
ieu
0.67
stad
0.67
umen
0.66
Activations Density 0.260%