INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     positif
    -0.08
    ifo
    -0.08
    iciência
    -0.07
    ичная
    -0.07
    Joshua
    -0.07
    _negative
    -0.07
     reconcile
    -0.07
    ric
    -0.07
     Churchill
    -0.07
    umatoid
    -0.07
    POSITIVE LOGITS
     Misch
    0.09
    .edu
    0.08
    .pdf
    0.08
     Analytical
    0.08
     estud
    0.07
     MIX
    0.07
     TOKEN
    0.07
     advert
    0.07
    ;↵↵↵//
    0.07
    .Configure
    0.07
    Act Density 0.001%

    No Known Activations