INDEX
    Explanations

    discussions around social and political actions and their implications

    New Auto-Interp
    Negative Logits
    apolis
    -0.17
    ilig
    -0.16
     done
    -0.15
    beg
    -0.14
     exert
    -0.14
     undertaken
    -0.14
    isko
    -0.14
    dup
    -0.14
    irit
    -0.13
    qli
    -0.13
    POSITIVE LOGITS
     worthy
    0.16
    onto
    0.15
     необÑħодим
    0.15
    worthy
    0.15
     Worth
    0.14
    çĵľ
    0.14
    oad
    0.14
    аÑĢа
    0.14
    equal
    0.13
    .sleep
    0.13
    Act Density 0.629%

    No Known Activations