INDEX
Explanations
phrases indicating prohibition or negation
negations about what should or shouldn't occur
New Auto-Interp
Negative Logits
FW
-0.66
Via
-0.65
Hungry
-0.64
Wid
-0.64
Panasonic
-0.63
hole
-0.62
irresistible
-0.61
LIN
-0.60
Planes
-0.60
marvelous
-0.59
POSITIVE LOGITS
emulate
0.88
be
0.82
regnancy
0.78
reated
0.77
imize
0.76
imitate
0.73
arent
0.73
erest
0.72
necessarily
0.69
behave
0.69
Activations Density 0.090%