INDEX
    Explanations

    references to various types of media and their characteristics

    New Auto-Interp
    Negative Logits
    oice
    -0.17
    vang
    -0.15
    åĤ
    -0.15
     hans
    -0.15
    iele
    -0.15
    the
    -0.15
     Sacr
    -0.14
    orge
    -0.14
    100
    -0.14
    iek
    -0.14
    POSITIVE LOGITS
     we
    0.29
     she
    0.27
     they
    0.26
     mình
    0.25
     you
    0.23
     he
    0.22
     она
    0.19
     они
    0.19
    æĪij们
    0.17
    à¸ķà¸Ļ
    0.17
    Act Density 0.366%

    No Known Activations