INDEX
Explanations
phrases related to proper nouns, particularly those related to political figures
references to the word "cond" and its variations in different contexts
New Auto-Interp
Negative Logits
chy
-0.70
kaya
-0.69
ngth
-0.68
ment
-0.64
ku
-0.63
manship
-0.63
DW
-0.63
Sham
-0.62
DIR
-0.60
bul
-0.60
POSITIVE LOGITS
ominium
1.20
enser
1.19
ensation
1.13
ensed
1.12
ensing
1.04
uctor
1.00
itions
0.99
itional
0.97
uits
0.93
secut
0.86
Activations Density 0.028%