INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     WL
    -0.07
    ποτε
    -0.07
    ecurity
    -0.06
    -conf
    -0.06
    ороз
    -0.06
    -0.06
    nw
    -0.06
    -0.06
    ानन
    -0.06
    gain
    -0.06
    POSITIVE LOGITS
     assaulted
    0.07
     Attribution
    0.07
     Lesser
    0.06
    Orth
    0.06
    .balance
    0.06
    Automatic
    0.06
    areas
    0.06
     affirmation
    0.06
     richt
    0.06
     aument
    0.06
    Act Density 0.000%

    No Known Activations