INDEX
Explanations
references to alcohol use and its effects
New Auto-Interp
Negative Logits
wart
-0.17
.emf
-0.15
Watkins
-0.14
\grid
-0.14
é¡ĺãģĦ
-0.14
leck
-0.14
gebn
-0.13
Electricity
-0.13
ÑĢÑĥп
-0.13
oley
-0.13
POSITIVE LOGITS
tips
0.38
tips
0.32
Tips
0.32
Tips
0.32
wasted
0.28
dr
0.28
dr
0.27
hammered
0.27
-dr
0.26
dru
0.25
Activations Density 0.116%