INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     medium
    -0.73
     consistent
    -0.68
     output
    -0.66
    output
    -0.62
     moderate
    -0.60
     input
    -0.59
     Output
    -0.59
     Consistent
    -0.59
    Consistent
    -0.52
    gelijk
    -0.51
    POSITIVE LOGITS
     hooves
    0.71
    ioutil
    0.68
    providedIn
    0.68
    צלחה
    0.63
    0.63
     floats
    0.62
     boughs
    0.62
    httphttps
    0.62
     strung
    0.61
     Mez
    0.61
    Act Density 0.065%

    No Known Activations