INDEX
    Explanations

    references to societal and systemic structures or effects

    New Auto-Interp
    Negative Logits
    ilot
    -0.16
    oma
    -0.16
    iano
    -0.15
    kop
    -0.14
    fa
    -0.14
    usion
    -0.14
    xis
    -0.14
    tes
    -0.14
    internal
    -0.14
     already
    -0.14
    POSITIVE LOGITS
     lors
    0.17
    ãģĬ
    0.16
     ÙĩÙĨگاÙħ
    0.15
    lessly
    0.15
    ÄįnÄĽ
    0.14
    148
    0.14
    /Math
    0.14
     during
    0.14
    rug
    0.13
    czas
    0.13
    Act Density 0.003%

    No Known Activations