INDEX
Explanations
proper nouns, often related to people or specific entities
occurrences of proper nouns and specific names
New Auto-Interp
Negative Logits
ĺħ
-0.70
hostages
-0.69
tremend
-0.69
²¾
-0.69
ģ«
-0.67
hindsight
-0.65
Skydragon
-0.62
captivity
-0.61
impulse
-0.60
¿½
-0.60
POSITIVE LOGITS
akeru
0.88
ihu
0.80
rescent
0.71
dale
0.68
erella
0.68
ounter
0.68
aste
0.67
eret
0.66
Nieto
0.66
utor
0.66
Activations Density 0.118%