INDEX
Explanations
adjectives related to physical closeness or tightness
descriptions of constrained or limited situations
New Auto-Interp
Negative Logits
ulhu
-0.81
代
-0.74
ablishment
-0.72
Twain
-0.71
ividual
-0.70
ista
-0.69
illery
-0.68
icative
-0.68
Courage
-0.67
;;;;;;;;;;;;
-0.67
POSITIVE LOGITS
ness
1.08
nesses
1.03
lining
0.85
tight
0.83
squeeze
0.80
fitting
0.79
heed
0.79
est
0.78
tail
0.76
weed
0.76
Activations Density 0.017%