INDEX
Explanations
references to censorship and related controversies
mentions of censorship and its related concepts
New Auto-Interp
Negative Logits
Lakes
-0.81
amac
-0.79
Leary
-0.78
ths
-0.78
ptoms
-0.73
thritis
-0.72
nie
-0.72
thus
-0.72
ndra
-0.72
acts
-0.71
POSITIVE LOGITS
censorship
1.45
censor
1.33
cens
1.21
censored
1.13
suppression
0.82
suppressing
0.78
ourgeois
0.77
cutter
0.77
disadvant
0.75
blackout
0.75
Activations Density 0.012%