INDEX
Explanations
references to men's health and masculinity issues
New Auto-Interp
Negative Logits
vip
-0.18
asting
-0.16
lauf
-0.16
ay
-0.15
iv
-0.15
d
-0.15
gers
-0.15
dong
-0.14
amac
-0.14
POCH
-0.14
POSITIVE LOGITS
ubar
0.21
opause
0.17
aced
0.17
volent
0.17
Rah
0.15
ÏĤ
0.15
uli
0.15
ouver
0.15
folk
0.14
brit
0.14
Activations Density 0.051%