INDEX
Explanations
descriptors of a person's character and behavior, especially in terms of negative traits
New Auto-Interp
Negative Logits
lea
-0.16
å¡
-0.15
Encoder
-0.15
feit
-0.15
ought
-0.15
ãĥIJãĤ¹
-0.14
misunder
-0.14
UTILITY
-0.14
ÑģоÑĩ
-0.14
ieber
-0.13
POSITIVE LOGITS
temper
0.24
drinking
0.23
mood
0.23
physical
0.19
prom
0.19
eg
0.19
drink
0.18
Drinking
0.17
temp
0.17
enge
0.17
Activations Density 0.058%