INDEX
Explanations
references to freedom of expression and rights related to speech
New Auto-Interp
Negative Logits
OGND
-0.99
IsContent
-0.80
windowFixed
-0.74
FormTagHelper
-0.70
uxxxx
-0.70
хьтан
-0.70
snippetHide
-0.66
qrstuvwxyz
-0.64
anthropo
-0.64
matchCondition
-0.62
POSITIVE LOGITS
freedom
1.16
expression
1.13
speech
1.12
free
0.99
Expression
0.96
freedom
0.95
Freedom
0.95
Freedom
0.95
Speech
0.93
Speech
0.93
Activations Density 0.362%