INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ―――――
    -1.00
     Theſe
    -0.96
     Majefty
    -0.91
     Anſ
    -0.90
     Jefus
    -0.90
    withstanding
    -0.87
     Eſ
    -0.87
     ――――
    -0.86
     purpoſe
    -0.86
     greateſt
    -0.86
    POSITIVE LOGITS
    <bos>
    1.20
    ↵↵
    0.82
    0.69
    '
    0.55
    #
    0.50
    0.47
     بتاريخ
    0.46
    Étape
    0.43
    s
    0.42
     firstly
    0.42
    Act Density 1.695%

    No Known Activations