INDEX
    Explanations

    metrics related to political approval ratings

    New Auto-Interp
    Negative Logits
    eti
    -0.15
    CCA
    -0.15
    agina
    -0.14
    ovky
    -0.14
    evity
    -0.14
    entai
    -0.14
    ỡ
    -0.14
    obody
    -0.14
    ouser
    -0.14
    PasswordEncoder
    -0.14
    POSITIVE LOGITS
    &action
    0.15
     Fle
    0.15
    rops
    0.15
    702
    0.14
    zim
    0.13
    house
    0.13
     Dise
    0.13
     unp
    0.13
     explicitly
    0.13
     House
    0.13
    Act Density 0.065%

    No Known Activations