INDEX
Explanations
proper nouns, specifically names of individuals or entities
proper nouns, specifically names of people and organizations
New Auto-Interp
Negative Logits
ires
-0.82
auga
-0.78
ECH
-0.74
Jakarta
-0.71
Samantha
-0.70
onga
-0.70
urtle
-0.70
ansas
-0.69
IRED
-0.69
Armenia
-0.69
POSITIVE LOGITS
Frey
0.90
swer
0.81
_{0.80
flush
0.80
cipled
0.79
Reply
0.77
vous
0.75
ezvous
0.75
interstitial
0.75
tag
0.74
Activations Density 0.016%