INDEX
Explanations
phrases related to taking action or responding to a situation
negations or phrases expressing refusal or denial
New Auto-Interp
Negative Logits
akedown
-0.74
ailability
-0.72
bda
-0.64
lishes
-0.63
retty
-0.63
ongevity
-0.61
pend
-0.58
natureconservancy
-0.58
uncture
-0.56
ravel
-0.55
POSITIVE LOGITS
to
1.29
unto
1.15
aloud
1.11
to
1.04
verbally
1.01
TO
0.94
thereto
0.90
anonymously
0.89
publicly
0.88
towards
0.88
Activations Density 0.375%