INDEX
Explanations
occurrences of contractions with "n't"
expressions of prohibition or negation
New Auto-Interp
Negative Logits
veins
-0.74
continents
-0.65
NetMessage
-0.64
vein
-0.63
trails
-0.62
landscapes
-0.60
hole
-0.58
holes
-0.58
cores
-0.58
Wid
-0.58
POSITIVE LOGITS
emulate
0.88
arent
0.75
reated
0.72
disclose
0.71
regnancy
0.71
behave
0.70
ilege
0.69
be
0.69
imize
0.69
ention
0.69
Activations Density 0.062%