INDEX
Explanations
negative descriptors related to dishonesty or low quality
New Auto-Interp
Negative Logits
eless
-0.16
unfavor
-0.15
lek
-0.15
anford
-0.15
isay
-0.15
dr
-0.14
oppon
-0.14
occasion
-0.14
iki
-0.14
riv
-0.13
POSITIVE LOGITS
ÛĮز
0.17
anske
0.16
aires
0.16
Winds
0.15
.ide
0.15
agan
0.15
VILLE
0.15
pus
0.15
ville
0.14
Coh
0.14
Activations Density 0.073%