INDEX
Explanations
mentions of the word "adult" with different connotations
references to adults and adult themes
New Auto-Interp
Negative Logits
adr
-0.93
atility
-0.82
ãĤ¡
-0.79
wark
-0.72
veyard
-0.70
atile
-0.70
href
-0.68
SOURCE
-0.65
bley
-0.65
raged
-0.65
POSITIVE LOGITS
erer
0.95
Swim
0.90
erers
0.89
male
0.80
beverages
0.79
males
0.78
Friend
0.78
sized
0.76
estate
0.75
supervision
0.75
Activations Density 0.016%