INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ?
    1.08
    are
    0.83
    6
    0.82
    of
    0.80
     prodigious
    0.79
    ни
    0.77
    erical
    0.77
     schreibt
    0.76
    9
    0.76
     enggak
    0.75
    POSITIVE LOGITS
     (
    1.01
    م
    1.01
    ار
    0.98
    מ
    0.96
    0.93
    k
    0.92
    h
    0.89
    m
    0.86
    0.85
    ور
    0.83
    Act Density 0.006%

    No Known Activations