INDEX
    Explanations

    references to political figures and their actions

    New Auto-Interp
    Negative Logits
    ุà¹Ī
    -0.15
    ulle
    -0.15
    ella
    -0.15
     Liv
    -0.14
    ope
    -0.14
    stack
    -0.14
     stack
    -0.14
    aroo
    -0.14
     Goods
    -0.14
     run
    -0.13
    POSITIVE LOGITS
    èħ
    0.16
    รร
    0.15
    /by
    0.15
     İli
    0.15
    duk
    0.15
    é¬
    0.15
    cov
    0.14
     Giang
    0.14
    ppers
    0.14
    CID
    0.14
    Act Density 0.065%

    No Known Activations