INDEX
Explanations
content related to free speech and its legal implications
New Auto-Interp
Negative Logits
902
-0.15
iju
-0.14
gett
-0.14
Wallpaper
-0.14
811
-0.14
окол
-0.14
loggedin
-0.14
fm
-0.14
acin
-0.13
electr
-0.13
POSITIVE LOGITS
speech
0.51
Speech
0.46
First
0.44
speech
0.42
Speech
0.42
free
0.39
peech
0.36
freedom
0.34
First
0.32
free
0.31
Activations Density 0.197%