INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .Run
    -0.07
    /domain
    -0.06
     Circus
    -0.06
    Ratio
    -0.06
     crap
    -0.06
    attro
    -0.06
     appreh
    -0.06
    TemplateName
    -0.06
    Chicken
    -0.06
     Correct
    -0.06
    POSITIVE LOGITS
     nostalgic
    0.08
    怀
    0.07
     nostalgia
    0.07
     GN
    0.06
     адміністра
    0.06
     Ül
    0.06
    argo
    0.06
     Eg
    0.06
    (cf
    0.06
    .aw
    0.06
    Act Density 0.005%

    No Known Activations