INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ATURES
    -0.83
     گرم
    -0.81
     UserModel
    -0.79
     longtemps
    -0.75
     psi
    -0.75
     appréci
    -0.73
     ater
    -0.70
    めている
    -0.70
    зина
    -0.69
     亚
    -0.69
    POSITIVE LOGITS
     curiosity
    0.75
    Làm
    0.75
     Named
    0.74
    探索
    0.74
     reino
    0.72
     stimulating
    0.71
     codif
    0.71
    towania
    0.71
    čke
    0.71
    0.70
    Act Density 0.001%

    No Known Activations