INDEX
Explanations
proper names and entities likely related to a specific location or organization
proper nouns, particularly names and locations
New Auto-Interp
Negative Logits
Spectre
-0.69
Lev
-0.67
Bound
-0.65
Ruth
-0.64
Corona
-0.63
Score
-0.63
Ber
-0.62
Phelps
-0.61
Berger
-0.61
Brave
-0.61
POSITIVE LOGITS
sembly
0.92
colm
0.91
ional
0.90
nam
0.89
nas
0.88
ignty
0.87
ctory
0.86
enei
0.86
nis
0.84
iment
0.84
Activations Density 0.022%