INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     své
    -0.07
     McCoy
    -0.07
     Speak
    -0.06
     ihren
    -0.06
     Giov
    -0.06
     coincidence
    -0.06
    -0.06
     sắp
    -0.06
    gv
    -0.06
    Sing
    -0.06
    POSITIVE LOGITS
     ultra
    0.14
     Ultra
    0.14
    Ultra
    0.12
    tra
    0.08
     uh
    0.08
    .rf
    0.07
     intra
    0.07
     ALPHA
    0.07
    AFF
    0.06
    ΡΑ
    0.06
    Act Density 0.003%

    No Known Activations