INDEX
    Explanations

    direct quotes from individuals

    New Auto-Interp
    Negative Logits
    s
    -0.69
    Ùĩ
    -0.32
    sburg
    -0.31
    sian
    -0.29
    ska
    -0.28
    a
    -0.26
    ÏĤ
    -0.25
    sand
    -0.24
    न
    -0.23
    sik
    -0.23
    POSITIVE LOGITS
    atre
    0.16
    wahl
    0.15
    ertest
    0.15
    odore
    0.15
    bsites
    0.14
    geber
    0.14
    gether
    0.14
     بÙĪØ§Ø¨Ø©
    0.14
    .Abstractions
    0.14
    دÙĪØ§Ø¬
    0.14
    Act Density 0.086%

    No Known Activations