INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ার
    1.04
    𝘢
    0.98
    也就是说
    0.96
    𝘪
    0.96
     фигур
    0.95
    ці
    0.95
     bebê
    0.93
     préparations
    0.91
     malfunctions
    0.91
    0.91
    POSITIVE LOGITS
    s
    1.14
    m
    1.07
    news
    0.99
    م
    0.97
    moto
    0.96
    nian
    0.95
    nou
    0.94
    0
    0.94
    ot
    0.93
    melt
    0.91
    Act Density 0.001%

    No Known Activations