INDEX
    Explanations

    references to political figures and their actions

    New Auto-Interp
    Negative Logits
    ATRIX
    -0.18
    ëŀľëĵľ
    -0.17
     diren
    -0.17
    cestor
    -0.15
    raquo
    -0.15
     addCriterion
    -0.15
    istring
    -0.14
     tá»ij
    -0.14
     restau
    -0.14
    OKIE
    -0.14
    POSITIVE LOGITS
    ÃĤ
    0.18
    â
    0.17
     _
    0.16
     (
    0.16
     said
    0.16
     gu
    0.15
     [â̦
    0.15
     
    0.15
    ,
    0.15
     â
    0.15
    Act Density 0.038%

    No Known Activations