INDEX
    Explanations

    was followed by past states

    New Auto-Interp
    Negative Logits
    it
    1.21
    ir
    1.06
    ing
    1.05
    ش
    1.04
    ant
    1.02
    il
    0.96
    ے
    0.96
    ?
    0.95
    ка
    0.94
    ly
    0.93
    POSITIVE LOGITS
    ي
    1.35
    кі
    1.01
    י
    0.98
    0.95
     was
    0.94
     is
    0.85
     
    0.84
    0.83
    。)
    0.82
    。</
    0.79
    Act Density 0.164%

    No Known Activations