INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Bundy
    -0.85
    eln
    -0.81
    lar
    -0.74
    vernment
    -0.73
    chief
    -0.72
    gotten
    -0.69
    atari
    -0.69
    edia
    -0.68
    henko
    -0.67
    atl
    -0.66
    POSITIVE LOGITS
    Age
    0.78
     age
    0.76
     cohorts
    0.75
    liest
    0.72
    lier
    0.72
    angering
    0.70
     ages
    0.64
    Wallet
    0.63
     compulsory
    0.63
    gap
    0.61
    Act Density 0.028%

    No Known Activations