INDEX
    Explanations

    varies, varied, intensified

    New Auto-Interp
    Negative Logits
    ;
    0.73
    :
    0.72
     porque
    0.72
    .
    0.69
     deoarece
    0.67
     ซึ่ง
    0.67
     (
    0.66
     Parce
    0.65
     creando
    0.65
     because
    0.64
    POSITIVE LOGITS
    はこちら
    0.90
    ଣ୍
    0.87
     점점
    0.83
    日は
    0.81
     नव्ह
    0.80
    0.80
    かは
    0.79
    はこの
    0.79
     wasnt
    0.79
     ಕಡಿ
    0.78
    Act Density 1.016%

    No Known Activations