INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     suf
    -0.08
    riger
    -0.07
     Lingu
    -0.06
    %;">↵
    -0.06
     İb
    -0.06
     ней
    -0.06
     προς
    -0.06
    (parser
    -0.06
    Histor
    -0.06
     feasible
    -0.06
    POSITIVE LOGITS
    Cfg
    0.06
     thinking
    0.06
     beside
    0.06
    .DEBUG
    0.06
     zákona
    0.06
     bec
    0.06
     건강
    0.06
     asleep
    0.06
     boob
    0.06
     BaseEntity
    0.06
    Act Density 0.023%

    No Known Activations