INDEX
Explanations
names of individuals and significant historical figures
New Auto-Interp
Negative Logits
%)$
-0.77
ÉM
-0.75
siez
-0.71
ViewImports
-0.70
omation
-0.68
oine
-0.68
bode
-0.68
eaway
-0.67
."</
-0.65
INTEN
-0.65
POSITIVE LOGITS
himself
0.91
'
0.79
’
0.68
Himself
0.62
himself
0.61
ssohn
0.59
Seeder
0.55
who
0.54
nødven
0.54
desnuda
0.54
Activations Density 0.339%