INDEX
Explanations
words related to drunkenness
references to drunkenness and related negative behaviors
New Auto-Interp
Negative Logits
acea
-0.92
atro
-0.92
ioxide
-0.85
ILA
-0.82
agnetic
-0.78
代
-0.78
ramid
-0.77
irtual
-0.75
isite
-0.74
ordon
-0.74
POSITIVE LOGITS
ness
1.24
ly
1.12
nesses
0.84
drunken
0.82
liness
0.82
sailor
0.74
antics
0.73
lust
0.72
disorderly
0.72
indisc
0.72
Activations Density 0.044%