INDEX
    Explanations

    directing actions or states

    New Auto-Interp
    Negative Logits
    0.49
    נים
    0.47
    Estados
    0.45
    0.44
     कमला
    0.43
     Бел
    0.43
    Earn
    0.43
    д
    0.43
     прока
    0.43
     בא
    0.42
    POSITIVE LOGITS
     sensed
    0.49
     trafik
    0.46
     potentiel
    0.46
     sensing
    0.45
     poids
    0.44
     vanwege
    0.44
     cruelty
    0.43
     rider
    0.43
     mulig
    0.42
     potenz
    0.42
    Act Density 0.004%

    No Known Activations