INDEX
    Explanations

    references and citations in various formats

    New Auto-Interp
    Negative Logits
    138
    -0.18
    ̣
    -0.16
    etik
    -0.16
     ec
    -0.15
    217
    -0.15
    reff
    -0.15
    pak
    -0.15
    176
    -0.14
    994
    -0.14
    emann
    -0.14
    POSITIVE LOGITS
     пÑĢиÑħод
    0.16
    ickt
    0.16
    rama
    0.15
     же
    0.14
    peq
    0.14
    ulus
    0.14
    ocht
    0.14
    stra
    0.14
    atform
    0.14
    strate
    0.13
    Act Density 0.004%

    No Known Activations