INDEX
Explanations
phrases emphasizing the absence or negation of something
phrases emphasizing the lack of evidence or the universal applicability of a statement
New Auto-Interp
Negative Logits
Module
-0.61
pat
-0.61
NES
-0.60
elfth
-0.59
ÅĤ
-0.59
emn
-0.59
rig
-0.58
itely
-0.57
erb
-0.54
kers
-0.53
POSITIVE LOGITS
except
0.76
else
0.74
alike
0.72
.
0.70
agher
0.68
--
0.68
imaginable
0.67
soever
0.67
ãĢĤ
0.66
-
0.65
Activations Density 0.153%