INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    nou
    -0.08
    ANTE
    -0.07
    nest
    -0.07
     Stanton
    -0.07
     Emmanuel
    -0.07
     Сан
    -0.07
     Stanley
    -0.07
     Nelson
    -0.07
     Brent
    -0.07
    -0.07
    POSITIVE LOGITS
     diff
    0.16
    Diff
    0.15
    diff
    0.14
     Diff
    0.14
    _diff
    0.14
    DIFF
    0.11
     DIFF
    0.10
    .diff
    0.10
    if
    0.10
    (diff
    0.10
    Act Density 0.013%

    No Known Activations