INDEX
Explanations
phrases indicating denial or negation
negations and phrases indicating a lack of agreement or acceptance
New Auto-Interp
Negative Logits
inus
-0.70
thus
-0.66
folios
-0.64
bath
-0.62
shown
-0.62
akeru
-0.61
sold
-0.60
press
-0.60
ubi
-0.58
rush
-0.58
POSITIVE LOGITS
ones
0.68
theoretically
0.63
othes
0.62
foreseeable
0.61
Spoiler
0.61
vised
0.59
ptroller
0.59
hap
0.58
lawyers
0.57
plom
0.57
Activations Density 0.066%