INDEX
    Explanations

    describes simplicity or directness

    New Auto-Interp
    Negative Logits
     şeyler
    0.49
     fortale
    0.48
    経営
    0.45
     важли
    0.44
     doenças
    0.43
     oorlog
    0.43
     další
    0.42
     nogle
    0.42
     assuntos
    0.42
     चुनौतियों
    0.42
    POSITIVE LOGITS
     simply
    0.72
     simple
    0.68
     semplicemente
    0.63
    simple
    0.60
     simplemente
    0.60
    只需
    0.59
     straightforward
    0.59
    simply
    0.59
     exactly
    0.58
     সরাসরি
    0.57
    Act Density 0.255%

    No Known Activations