INDEX
Explanations
names of specific individuals, particularly politicians
mentions of specific individuals, particularly politicians and notable figures
New Auto-Interp
Negative Logits
ters
-0.78
phrine
-0.78
ching
-0.77
ths
-0.73
cise
-0.72
brance
-0.71
omial
-0.67
htt
-0.67
monds
-0.67
yright
-0.65
POSITIVE LOGITS
ozy
0.93
inian
0.86
olini
0.83
kas
0.77
anye
0.76
schild
0.75
anski
0.75
lain
0.73
ika
0.72
inia
0.71
Activations Density 0.078%