INDEX
    Explanations

    science/scientist

    New Auto-Interp
    Negative Logits
     Nab
    -0.08
     Herr
    -0.08
    nung
    -0.08
     Leh
    -0.08
     Macbeth
    -0.08
     Menn
    -0.08
     Edmund
    -0.07
     Quinn
    -0.07
    .ed
    -0.07
     cray
    -0.07
    POSITIVE LOGITS
    business
    0.08
     Twitter
    0.08
     assistants
    0.08
    ್ಯಾಸ
    0.07
    ifi
    0.07
     pipeline
    0.07
    pipeline
    0.07
    Pipeline
    0.07
    liers
    0.07
     unicorn
    0.07
    Act Density 0.005%

    No Known Activations