INDEX
Explanations
negative statements or denials
expressions that deny or negate statements
New Auto-Interp
Negative Logits
anwhile
-0.69
Brief
-0.63
ALLY
-0.63
progressively
-0.62
ationally
-0.62
essentially
-0.62
basically
-0.61
psey
-0.61
ĻĤ
-0.60
ingly
-0.60
POSITIVE LOGITS
onen
0.70
å¤
0.67
underest
0.63
ibe
0.63
spir
0.63
von
0.61
ettings
0.60
omething
0.60
enthusi
0.59
gged
0.59
Activations Density 0.194%