INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     you
    1.53
    н
    1.52
     are
    1.39
    1.28
     who
    1.22
    м
    1.18
    1.18
    ない
    1.13
    ни
    1.13
    1
    1.13
    POSITIVE LOGITS
    O
    1.08
    R
    1.02
    OL
    1.01
    b
    1.01
    RI
    0.99
    avio
    0.98
    B
    0.94
     આવા
    0.93
    H
    0.93
    U
    0.91
    Act Density 0.003%

    No Known Activations