INDEX
Explanations
proper nouns
mentions of the name "Mark."
New Auto-Interp
Negative Logits
ught
-0.80
unnecess
-0.75
vain
-0.71
disposed
-0.70
deliber
-0.70
urses
-0.69
gladly
-0.69
politic
-0.67
女
-0.67
traged
-0.67
POSITIVE LOGITS
eting
1.07
Mark
1.06
Twain
1.03
mark
0.96
furt
0.95
owitz
0.93
down
0.91
emark
0.91
marks
0.88
Mark
0.83
Activations Density 0.009%