INDEX
Explanations
references to smoking activities and related contexts
New Auto-Interp
Negative Logits
adele
-0.15
errat
-0.14
otts
-0.14
uppy
-0.13
hostages
-0.13
بع
-0.13
hostage
-0.13
ç§ĭ
-0.12
CTL
-0.12
brightness
-0.12
POSITIVE LOGITS
cigarettes
0.35
tobacco
0.33
cigarette
0.33
pipe
0.32
pipes
0.31
cigars
0.30
smoking
0.29
cigar
0.28
Pipes
0.27
smoke
0.27
Activations Density 0.045%