INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     degust
    -0.91
    ídas
    -0.84
     nā
    -0.81
     آمار
    -0.79
     @}
    -0.78
     &_
    -0.77
    ianto
    -0.77
     volte
    -0.77
    Economia
    -0.77
    adering
    -0.76
    POSITIVE LOGITS
    ll
    1.15
    cl
    1.05
    left
    0.95
    lll
    0.95
    wl
    0.93
     left
    0.92
    LikeLike
    0.91
    cll
    0.90
    rl
    0.90
    oll
    0.88
    Act Density 0.003%

    No Known Activations