INDEX
Explanations
adjectives that denote qualities or characteristics
New Auto-Interp
Negative Logits
REFERRED
-0.16
.wp
-0.15
alse
-0.14
Stout
-0.14
amac
-0.14
ittel
-0.14
ruž
-0.14
ture
-0.13
gang
-0.13
asel
-0.13
POSITIVE LOGITS
ency
0.17
ighthouse
0.16
Interval
0.16
VRT
0.15
dea
0.14
interval
0.14
interval
0.14
IDO
0.14
Rosenstein
0.14
Tro
0.14
Activations Density 0.077%