INDEX
Explanations
words related to loss, harm, and negative consequences
New Auto-Interp
Negative Logits
minus
-0.17
undle
-0.16
ruba
-0.15
rets
-0.15
ë°©
-0.14
Dank
-0.14
Hairst
-0.14
agt
-0.14
lessness
-0.13
idine
-0.13
POSITIVE LOGITS
edla
0.15
baum
0.14
LING
0.14
inka
0.14
amera
0.14
posables
0.14
ayın
0.14
.scalablytyped
0.14
employment
0.13
Touches
0.13
Activations Density 0.023%