INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    oses
    -0.70
    Guard
    -0.70
    lean
    -0.66
    respect
    -0.63
    aq
    -0.62
    gur
    -0.62
    Gaza
    -0.60
    le
    -0.60
    ´
    -0.59
    eal
    -0.59
    POSITIVE LOGITS
     they
    0.95
    soever
    0.90
     there
    0.89
     THEY
    0.89
     we
    0.86
     nobody
    0.83
     unlike
    0.81
     although
    0.79
    */(
    0.76
     it
    0.74
    Act Density 0.111%

    No Known Activations