INDEX
Explanations
discourse markers or expressions that indicate contrast or transitions in thought
New Auto-Interp
Negative Logits
ans
-0.15
_joint
-0.14
rea
-0.14
ansa
-0.14
otec
-0.14
ycler
-0.14
lix
-0.14
quals
-0.14
fel
-0.14
vey
-0.13
POSITIVE LOGITS
cala
0.15
okane
0.15
codegen
0.15
tml
0.15
bum
0.14
enas
0.14
ieux
0.14
rych
0.14
éIJĺ
0.14
aces
0.14
Activations Density 0.000%