INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cease
    -0.07
    τών
    -0.07
    ategor
    -0.07
    ností
    -0.07
    taj
    -0.07
    )↵↵↵↵↵↵
    -0.06
    -0.06
    :↵↵↵
    -0.06
    ερ
    -0.06
     стари
    -0.06
    POSITIVE LOGITS
     leaf
    0.07
    При
    0.06
    .runtime
    0.06
    Role
    0.06
     Franco
    0.06
     muse
    0.06
    pany
    0.06
     offender
    0.06
    .Linq
    0.06
     Sing
    0.06
    Act Density 0.011%

    No Known Activations