INDEX
    Explanations

    mentions of specific names and locations

    New Auto-Interp
    Negative Logits
    interrupted
    -1.01
    ipolar
    -0.99
    antha
    -0.98
     metic
    -0.91
    usher
    -0.90
     eleph
    -0.90
     circum
    -0.86
    kernel
    -0.85
     allowances
    -0.85
     occas
    -0.84
    POSITIVE LOGITS
    brate
    2.06
    brates
    1.92
    llers
    1.49
    ller
    1.48
    achers
    1.37
    levision
    1.37
    achable
    1.34
    ppo
    1.29
    xit
    1.26
    legraph
    1.25
    Act Density 0.854%

    No Known Activations