INDEX
Explanations
mentions of being under the influence of substances
signals or indicators of drug use and potential intoxication
New Auto-Interp
Negative Logits
displayText
-0.65
partName
-0.57
lineback
-0.56
antage
-0.55
cour
-0.55
igi
-0.53
ctors
-0.52
ofi
-0.51
vironments
-0.50
uitive
-0.50
POSITIVE LOGITS
inappropriately
0.76
improperly
0.76
"'
0.72
nefarious
0.71
illegally
0.67
unfairly
0.66
illegitimate
0.65
prostitutes
0.64
improper
0.63
wrongly
0.63
Activations Density 1.538%