INDEX
    Explanations

    concepts related to freedom of speech and its limitations

    New Auto-Interp
    Negative Logits
     ap
    -0.15
    CodeGen
    -0.15
     Sec
    -0.14
    fov
    -0.14
     suspicious
    -0.14
     amnesty
    -0.14
     plea
    -0.14
     fish
    -0.14
    amura
    -0.14
     EO
    -0.14
    POSITIVE LOGITS
     defamation
    0.22
    arella
    0.18
     publication
    0.18
     lib
    0.18
    publication
    0.18
    漫
    0.18
    dam
    0.18
     Publications
    0.17
    ontent
    0.16
    -publish
    0.16
    Act Density 0.018%

    No Known Activations