INDEX
    Explanations

    instructions followed by a colon

    New Auto-Interp
    Negative Logits
    fourth
    0.35
     programmer
    0.34
    rename
    0.34
    outline
    0.33
    rary
    0.33
    programmer
    0.33
    aryng
    0.32
    []}
    0.32
    auern
    0.32
    大家
    0.31
    POSITIVE LOGITS
     uchun
    0.44
     Osaka
    0.41
     voor
    0.41
    0.41
          
    0.40
     غ
    0.40
     -
    0.39
     için
    0.39
     från
    0.39
     pierde
    0.39
    Act Density 0.044%

    No Known Activations