INDEX
    Explanations

    citations or references to specific research studies and their publication years

    New Auto-Interp
    Negative Logits
     Briggs
    -0.15
    ASY
    -0.15
    arch
    -0.15
    opleft
    -0.14
    erez
    -0.14
    elle
    -0.14
    ola
    -0.14
    elles
    -0.13
     cle
    -0.13
    azzi
    -0.13
    POSITIVE LOGITS
    licht
    0.16
    ubre
    0.14
    eder
    0.14
    dux
    0.14
    dsn
    0.14
     Fior
    0.13
    hist
    0.13
    reed
    0.13
    utory
    0.13
    ại
    0.13
    Act Density 0.030%

    No Known Activations