INDEX
Explanations
phrases indicating contrast or unexpected outcomes
the word "even" in various contexts
New Auto-Interp
Negative Logits
ffen
-0.90
aim
-0.88
isen
-0.85
rend
-0.83
Ĥİ
-0.82
idelines
-0.81
ierrez
-0.80
rition
-0.79
acker
-0.78
ayers
-0.78
POSITIVE LOGITS
remotely
0.90
though
0.82
outright
0.78
handedly
0.76
stranger
0.76
worse
0.75
tho
0.69
paran
0.67
downright
0.67
monary
0.64
Activations Density 0.059%