INDEX
Explanations
negative expressions or phrases indicating lack or failure
New Auto-Interp
Negative Logits
cens
-0.68
undrum
-0.68
opausal
-0.65
geries
-0.65
oug
-0.64
endi
-0.63
obo
-0.62
iewicz
-0.61
false
-0.60
understatement
-0.60
POSITIVE LOGITS
necessarily
0.96
etheless
0.85
cially
0.77
conclusive
0.76
enough
0.75
nonetheless
0.72
guarantee
0.67
theless
0.65
specifics
0.64
infall
0.62
Activations Density 0.191%