INDEX
    Explanations

    high-frequency function words or common linguistic structures

    New Auto-Interp
    Negative Logits
    abase
    -0.17
    izzo
    -0.17
    rvé
    -0.16
    ãĥ«ãĤ¯
    -0.15
     imp
    -0.15
    оже
    -0.14
     Bern
    -0.14
     Rosenstein
    -0.14
    irst
    -0.14
    agraph
    -0.14
    POSITIVE LOGITS
    ija
    0.17
    dorf
    0.15
    uros
    0.15
    lers
    0.15
    ler
    0.15
    йн
    0.15
    detect
    0.14
    endor
    0.14
    ario
    0.14
    alis
    0.14
    Act Density 0.000%

    No Known Activations