INDEX
    Explanations

    references to political figures and their associated actions

    New Auto-Interp
    Negative Logits
     mb
    -0.15
    argon
    -0.15
    alker
    -0.14
    avig
    -0.14
     instrument
    -0.14
    MB
    -0.14
     hollow
    -0.14
    andin
    -0.14
    ationale
    -0.14
    otions
    -0.14
    POSITIVE LOGITS
    ti
    0.21
    sis
    0.20
    tain
    0.20
    si
    0.19
    ture
    0.19
    tures
    0.19
    tu
    0.18
    bil
    0.18
    ni
    0.17
    tle
    0.17
    Act Density 0.012%

    No Known Activations