INDEX
    Explanations

    translate, calc, or gradient

    New Auto-Interp
    Negative Logits
    ही
    -0.76
    преки
    -0.75
    خابات
    -0.72
    Celtic
    -0.72
     still
    -0.71
     samb
    -0.71
     труд
    -0.69
     Neuron
    -0.68
    ichy
    -0.68
     DUKE
    -0.68
    POSITIVE LOGITS
    Calc
    0.87
     calc
    0.87
     translate
    0.86
    czył
    0.85
    gura
    0.85
    らは
    0.84
    calc
    0.82
     twist
    0.81
    translate
    0.81
    0.80
    Act Density 0.027%

    No Known Activations