INDEX
Explanations
specific references to substances or elements associated with health, crime, or personal items
New Auto-Interp
Negative Logits
Niet
-0.52
predec
-0.48
Democr
-0.48
oppable
-0.47
undermin
-0.47
Vaugh
-0.45
Ire
-0.45
Topics
-0.44
advoc
-0.42
senal
-0.42
POSITIVE LOGITS
âĢº
0.55
¶
0.55
weed
0.47
↵
0.45
?
0.44
âĢ¢
0.43
↵↵
0.43
↵Âł
0.42
][
0.42
/(
0.42
Activations Density 0.692%