INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    g
    1.03
    b
    0.94
    t
    0.77
    h
    0.76
    de
    0.75
    us
    0.74
    y
    0.74
    is
    0.70
    c
    0.70
    >
    0.68
    POSITIVE LOGITS
    ید
    0.79
    ancienne
    0.69
     bohater
    0.68
    janje
    0.68
    یم
    0.67
    dropPanel
    0.65
    affiche
    0.63
     alumnos
    0.63
     actriz
    0.63
    йд
    0.62
    Act Density 0.013%

    No Known Activations