INDEX
    Explanations

    references to laws or prohibitions

    New Auto-Interp
    Negative Logits
    ocks
    -0.16
    alet
    -0.16
    andler
    -0.15
    yk
    -0.15
    aid
    -0.15
    ickey
    -0.14
    chos
    -0.14
    .ua
    -0.14
    ÏĢον
    -0.14
    cul
    -0.13
    POSITIVE LOGITS
    ishment
    0.18
    adoo
    0.17
    semble
    0.15
    زد
    0.14
    hatt
    0.14
    ala
    0.14
    veal
    0.14
    ioneer
    0.14
    DEM
    0.14
    itore
    0.14
    Act Density 0.026%

    No Known Activations