INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _PK
    -0.08
    -0.08
     JE
    -0.08
     loudly
    -0.08
    ИЯ
    -0.07
    _______________
    -0.07
    Bol
    -0.07
    _pk
    -0.07
    -0.07
    ORE
    -0.07
    POSITIVE LOGITS
     Comunicação
    0.08
     emergence
    0.08
     warfare
    0.08
     Sadly
    0.08
     zpr
    0.07
     cuerpos
    0.07
     gene
    0.07
     ys
    0.07
     démocr
    0.07
     estate
    0.07
    Act Density 0.002%

    No Known Activations