INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     нач
    -0.08
     Reid
    -0.08
     Mot
    -0.07
     venom
    -0.07
     induct
    -0.07
     Rewrite
    -0.07
    Ultimate
    -0.07
     Lif
    -0.07
    Tak
    -0.07
     Tak
    -0.07
    POSITIVE LOGITS
    -hearted
    0.10
    hearted
    0.08
     burst
    0.08
     mạnh
    0.08
    0.08
    broker
    0.08
     curing
    0.08
     faint
    0.08
    ஞ்ச
    0.08
    0.08
    Act Density 0.033%

    No Known Activations