INDEX
    Explanations

    modding and mode imputation

    New Auto-Interp
    Negative Logits
     are
    1.19
     is
    0.96
    um
    0.86
    ı
    0.84
    ۔
    0.81
     I
    0.80
     Are
    0.80
     e
    0.80
     y
    0.79
     a
    0.79
    POSITIVE LOGITS
    (
    1.11
    ل
    1.05
    1.00
    った
    0.95
    ле
    0.91
    0.90
    ب
    0.89
    л
    0.88
    з
    0.87
    0.87
    Act Density 0.046%

    No Known Activations