INDEX
    Explanations

    standing up for oneself or others

    New Auto-Interp
    Negative Logits
     aument
    0.67
     en
    0.64
    Unter
    0.64
    Reduce
    0.61
    Land
    0.60
     Caribe
    0.59
    u
    0.59
     lleg
    0.59
     gris
    0.59
     enviar
    0.59
    POSITIVE LOGITS
     défendre
    0.63
    мол
    0.62
    ukone
    0.62
    نگ
    0.56
     intérêts
    0.56
    ці
    0.55
    ρα
    0.55
    چھ
    0.55
    фта
    0.55
    avaient
    0.55
    Act Density 0.016%

    No Known Activations