INDEX
Explanations
phrases related to internet regulation and security concerns
New Auto-Interp
Negative Logits
avra
-0.18
discharged
-0.16
discharge
-0.15
phetamine
-0.15
èªł
-0.14
arias
-0.14
ÑĥÑĩаÑģ
-0.14
blown
-0.14
Redistributions
-0.14
avr
-0.13
POSITIVE LOGITS
blocking
0.31
blocked
0.31
filtering
0.30
content
0.28
blocks
0.28
block
0.27
blocked
0.27
Blocked
0.27
censor
0.27
blocking
0.27
Activations Density 0.034%