INDEX
Explanations
sentences with phrases related to disagreement or conflict
words and phrases indicating contrast or conditions
New Auto-Interp
Negative Logits
'/
-0.67
reinforcement
-0.62
arsen
-0.61
Cheong
-0.60
fasc
-0.60
Shepard
-0.60
Aus
-0.60
curing
-0.60
Kro
-0.59
Rath
-0.59
POSITIVE LOGITS
theless
1.62
etheless
1.60
terday
1.25
withstanding
1.15
usterity
1.11
foundland
1.10
selves
1.10
odore
1.07
tenance
1.05
bsite
1.03
Activations Density 0.223%