INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    is
    0.90
    it
    0.84
    ле
    0.79
    ال
    0.77
    να
    0.75
    ра
    0.71
    س
    0.70
    ad
    0.67
    бо
    0.67
    ме
    0.67
    POSITIVE LOGITS
    0.91
    -
    0.83
    0.81
     écart
    0.77
     jeopard
    0.76
     welches
    0.68
     catheters
    0.66
    DI
    0.64
     adı
    0.63
     cheating
    0.63
    Act Density 1.992%

    No Known Activations