INDEX
Explanations
terms related to smoking and its effects
smokers or smoothers
New Auto-Interp
Negative Logits
채
-0.36
Hentet
-0.35
γ
-0.34
бути
-0.34
pick
-0.34
fär
-0.33
Kund
-0.32
drauf
-0.32
engaged
-0.32
tagged
-0.32
POSITIVE LOGITS
smo
0.93
Smo
0.92
Smo
0.89
smo
0.74
Efq
0.66
pleaſure
0.65
myſelf
0.64
smoothed
0.62
smog
0.61
Monfieur
0.60
Activations Density 0.008%