INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Card
    -0.07
    ’ét
    -0.07
     card
    -0.07
    agne
    -0.07
    reflect
    -0.07
    ETS
    -0.07
     geopol
    -0.07
    ....↵↵
    -0.07
    ette
    -0.06
    monkey
    -0.06
    POSITIVE LOGITS
     Bul
    0.09
     bul
    0.08
    กต
    0.07
    0.07
     sil
    0.06
     Lor
    0.06
    DAL
    0.06
     Ol
    0.06
     tot
    0.06
    --;
    0.06
    Act Density 0.005%

    No Known Activations