INDEX
Explanations
words related to censorship
terms related to censorship and its implications
New Auto-Interp
Negative Logits
verty
-0.77
ndra
-0.77
Mead
-0.75
ptoms
-0.74
ilater
-0.73
amac
-0.73
Dew
-0.69
tell
-0.67
rious
-0.66
swick
-0.66
POSITIVE LOGITS
censorship
1.07
cens
0.98
censor
0.96
censored
0.93
suppressing
0.78
zers
0.72
blackout
0.70
suppression
0.70
ourgeois
0.68
prohibited
0.66
Activations Density 0.048%