INDEX
    Explanations

    area measurement

    New Auto-Interp
    Negative Logits
     Wer
    -0.06
    iyet
    -0.06
    ğe
    -0.06
     Frage
    -0.06
    ('=
    -0.06
    assador
    -0.05
    ournament
    -0.05
    ời
    -0.05
    Kat
    -0.05
     Outstanding
    -0.05
    POSITIVE LOGITS
    lerdi
    0.07
    ynchron
    0.07
    '><
    0.07
    …………
    0.07
     crumbling
    0.07
    .first
    0.06
    اي
    0.06
    "]');↵
    0.06
    .'</
    0.06
    >()↵↵
    0.06
    Act Density 0.007%

    No Known Activations