INDEX
Explanations
phrases indicating a contrast or a negative condition
the word "neither" and its variations in context
New Auto-Interp
Negative Logits
enges
-0.66
tri
-0.65
roxy
-0.65
ÙĴ
-0.65
duc
-0.65
ournals
-0.63
agonal
-0.63
Bang
-0.63
enos
-0.63
è¯
-0.62
POSITIVE LOGITS
theless
0.74
llor
0.74
ndra
0.70
shalt
0.69
lect
0.68
soever
0.67
overtly
0.67
wegian
0.66
!--
0.66
necessarily
0.66
Activations Density 0.015%