INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     INTERESAR
    -0.62
     Vikipedi
    -0.61
     Оно
    -0.56
     Himself
    -0.49
    LEncoder
    -0.48
     cùng
    -0.48
     собі
    -0.47
     myself
    -0.46
    arith
    -0.45
     parlé
    -0.45
    POSITIVE LOGITS
     the
    0.96
     how
    0.93
     whether
    0.90
     much
    0.69
     most
    0.69
     MUCH
    0.68
    much
    0.66
     многое
    0.66
     what
    0.64
     alot
    0.63
    Act Density 0.002%

    No Known Activations