INDEX
Explanations
words related to controversial or inappropriate topics, often related to social or political issues
New Auto-Interp
Negative Logits
Ability
-0.77
lihood
-0.73
Solution
-0.72
DIV
-0.71
ODUCT
-0.71
Information
-0.69
RESULTS
-0.69
Internal
-0.69
Development
-0.69
EGIN
-0.68
POSITIVE LOGITS
bikini
0.88
punk
0.85
blaster
0.81
boobs
0.81
kisses
0.80
disco
0.79
tits
0.79
themed
0.79
ovies
0.78
fries
0.77
Activations Density 0.504%