INDEX
Explanations
words related to censorship or critical viewpoints
terms related to censorship
New Auto-Interp
Negative Logits
Dragonbound
-0.78
Goo
-0.74
¯¯¯¯¯¯¯¯
-0.73
Rebellion
-0.72
Valhalla
-0.67
Hipp
-0.66
Werewolf
-0.66
RESULTS
-0.65
Tet
-0.65
Donetsk
-0.63
POSITIVE LOGITS
orious
1.13
cens
1.13
orable
1.10
oring
1.08
orial
0.91
pling
0.90
¦
0.87
urable
0.87
ovy
0.86
ured
0.85
Activations Density 0.011%