INDEX
Explanations
mentions of being banned
instances of the word "banned" across various contexts
New Auto-Interp
Negative Logits
eah
-0.80
issance
-0.73
IMAGES
-0.73
Generations
-0.69
Auth
-0.68
sie
-0.66
rious
-0.66
ickey
-0.64
ilant
-0.64
prise
-0.64
POSITIVE LOGITS
hee
0.91
ishment
0.80
banning
0.77
netflix
0.76
banned
0.76
bans
0.73
substances
0.73
smoking
0.72
hammer
0.72
wana
0.71
Activations Density 0.019%