INDEX
    Explanations

    references to academic articles and their attributes

    New Auto-Interp
    Negative Logits
    oyer
    -0.16
    contacts
    -0.16
    oref
    -0.16
    éra
    -0.16
     BITTE
    -0.16
    utsch
    -0.16
    illow
    -0.15
    ailand
    -0.15
    ltk
    -0.15
    lore
    -0.14
    POSITIVE LOGITS
    opoulos
    0.17
    uary
    0.15
     bevor
    0.15
     redistrib
    0.14
    ucc
    0.14
    uss
    0.14
    uter
    0.14
    cho
    0.14
    cle
    0.13
     Hin
    0.13
    Act Density 0.002%

    No Known Activations