INDEX
Explanations
phrases expressing emphasis or contradiction
negations or expressions of denial
New Auto-Interp
Negative Logits
eleph
-1.04
pione
-1.03
ò
-1.01
aditional
-0.98
ortunately
-0.97
metic
-0.96
Þ
-0.94
ö
-0.92
practition
-0.92
ą
-0.91
POSITIVE LOGITS
't
1.70
´
1.03
\'
0.93
uts
0.89
ÃŃ
0.86
`
0.85
�
0.84
Õ
0.77
̶
0.73
bryce
0.72
Activations Density 0.114%