INDEX
    Explanations

    references to censorship and controversial speech issues

    New Auto-Interp
    Negative Logits
     localVar
    -0.14
    lint
    -0.14
    olumn
    -0.14
    æķĻ
    -0.14
     Santana
    -0.13
    GS
    -0.13
    usat
    -0.13
     Morav
    -0.13
    opup
    -0.13
    383
    -0.13
    POSITIVE LOGITS
     Alt
    0.38
     alt
    0.37
    Alt
    0.35
    -alt
    0.32
     ALT
    0.29
    .alt
    0.27
    _alt
    0.27
    ALT
    0.26
     Milo
    0.25
    alt
    0.24
    Act Density 0.081%

    No Known Activations