INDEX
Explanations
phrases related to censorship or being censored
mentions of censorship and related concepts
New Auto-Interp
Negative Logits
¯¯¯¯¯¯¯¯
-0.80
WAYS
-0.76
docker
-0.74
RESULTS
-0.71
amac
-0.70
MER
-0.69
deals
-0.68
Dee
-0.67
Origins
-0.66
ESA
-0.66
POSITIVE LOGITS
cens
1.63
censor
1.04
orious
0.92
oring
0.86
cens
0.82
censorship
0.81
orship
0.80
censored
0.79
orable
0.77
uitous
0.76
Activations Density 0.005%