INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    gili
    -0.17
    riad
    -0.17
    ceb
    -0.15
    affen
    -0.15
    lij
    -0.14
    zig
    -0.14
    onom
    -0.14
     Roose
    -0.14
    STALL
    -0.14
    eldon
    -0.14
    POSITIVE LOGITS
    ing
    0.17
     Shr
    0.15
     Grace
    0.15
     discrimin
    0.15
    umont
    0.14
     grace
    0.14
    olarity
    0.14
    Grace
    0.14
     discriminate
    0.13
    _defs
    0.13
    Act Density 0.008%

    No Known Activations