INDEX
    Explanations

    references to freedom of expression and rights related to speech

    New Auto-Interp
    Negative Logits
    OGND
    -0.99
    IsContent
    -0.80
    windowFixed
    -0.74
    FormTagHelper
    -0.70
    uxxxx
    -0.70
    хьтан
    -0.70
     snippetHide
    -0.66
    qrstuvwxyz
    -0.64
    anthropo
    -0.64
    matchCondition
    -0.62
    POSITIVE LOGITS
     freedom
    1.16
     expression
    1.13
     speech
    1.12
     free
    0.99
     Expression
    0.96
    freedom
    0.95
     Freedom
    0.95
    Freedom
    0.95
    Speech
    0.93
     Speech
    0.93
    Act Density 0.362%

    No Known Activations