INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    theless
    -0.64
    etheus
    -0.63
    andom
    -0.62
    rane
    -0.60
    sonian
    -0.57
    avior
    -0.56
     threat
    -0.54
     unknown
    -0.54
    bon
    -0.51
     bounty
    -0.51
    POSITIVE LOGITS
     noticing
    0.66
    ctor
    0.65
    ãĤ¨ãĥ«
    0.58
     answering
    0.57
    ãĤ®
    0.57
    peak
    0.56
     fielding
    0.55
     imagining
    0.55
     Sega
    0.55
     anybody
    0.54
    Act Density 0.023%

    No Known Activations