INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    the
    1.71
     be
    1.44
    ut
    1.43
    taining
    1.28
    time
    1.13
    ty
    1.10
    does
    1.04
    notes
    1.03
    ک
    1.03
    their
    1.02
    POSITIVE LOGITS
    ה
    1.29
    ك
    1.23
     (
    1.03
     Seit
    1.03
    ↵↵
    0.94
     Pero
    0.93
    كبر
    0.90
     reine
    0.89
     sorte
    0.88
     верши
    0.86
    Act Density 0.002%

    No Known Activations