INDEX
Explanations
references to internet regulation and censorship
New Auto-Interp
Negative Logits
discharged
-0.16
compat
-0.15
TEGR
-0.15
_PROC
-0.14
phetamine
-0.14
:convert
-0.14
integr
-0.14
integ
-0.14
integration
-0.14
ipped
-0.13
POSITIVE LOGITS
Internet
0.26
Content
0.25
Filtering
0.23
Domain
0.23
content
0.23
Internet
0.23
filtering
0.23
censorship
0.23
internet
0.23
domain
0.21
Activations Density 0.042%