INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    tumblr
    -0.71
     manipulating
    -0.60
     legacy
    -0.58
     tumble
    -0.58
     supposedly
    -0.57
     partying
    -0.57
     evolved
    -0.57
     unite
    -0.57
     inheritance
    -0.56
     originated
    -0.55
    POSITIVE LOGITS
    anton
    0.81
    ijn
    0.76
    jit
    0.74
    lee
    0.74
    aji
    0.73
     Cohen
    0.73
    elman
    0.73
     Shapiro
    0.73
    elli
    0.72
    arie
    0.72
    Act Density 0.424%

    No Known Activations