INDEX
Explanations
people's names
references to specific individuals and topics related to genitalia and cultural practices
New Auto-Interp
Negative Logits
gery
-0.80
holes
-0.75
escape
-0.74
udeb
-0.74
ggle
-0.72
lde
-0.72
eming
-0.72
kok
-0.71
kers
-0.71
#$
-0.70
POSITIVE LOGITS
ine
0.93
iated
0.92
inant
0.89
ially
0.87
aneous
0.87
inia
0.84
aneously
0.83
iple
0.79
iating
0.77
inations
0.77
Activations Density 0.063%