INDEX
Explanations
references to control and censorship of information
New Auto-Interp
Negative Logits
ubl
-0.08
ÑĢаÑģ
-0.07
ovit
-0.07
ozo
-0.07
gnore
-0.07
μι
-0.07
arih
-0.06
weis
-0.06
EMY
-0.06
hir
-0.06
POSITIVE LOGITS
freedom
0.09
fre
0.09
censorship
0.08
freely
0.08
regime
0.08
Freedom
0.08
forbidden
0.07
freedoms
0.07
fre
0.07
underground
0.07
Activations Density 0.057%