INDEX
Explanations
expressions of surprise or exclamation
New Auto-Interp
Negative Logits
myſelf
-1.09
himſelf
-0.92
themſelves
-0.88
houſe
-0.88
itſelf
-0.86
raiſ
-0.83
himself
-0.79
Houſe
-0.76
Efq
-0.76
perſon
-0.74
POSITIVE LOGITS
Oh
1.29
Oh
1.18
oh
1.10
oh
0.99
Ohh
0.98
OH
0.98
Ohhh
0.94
Oooh
0.90
Ohhhh
0.89
toh
0.85
Activations Density 0.044%