INDEX
Explanations
proper nouns, particularly names like "Boris"
the name "Boris" and its variations in the context of political discussions
New Auto-Interp
Negative Logits
prise
-0.71
venge
-0.71
etermination
-0.71
intent
-0.68
mble
-0.66
icago
-0.65
imony
-0.65
ppelin
-0.65
WORK
-0.65
ocrine
-0.65
POSITIVE LOGITS
Boris
1.12
Yel
1.00
ovich
0.93
Dia
0.90
Johnson
0.86
Nem
0.86
achev
0.83
sov
0.83
Bere
0.82
Karl
0.81
Activations Density 0.012%