INDEX
    Explanations

    numerical values and percentages

    New Auto-Interp
    Negative Logits
    oen
    -0.16
    urum
    -0.14
    doi
    -0.14
    gne
    -0.14
    alom
    -0.14
    γκ
    -0.14
    294
    -0.13
    orch
    -0.13
    sdale
    -0.13
    aneously
    -0.13
    POSITIVE LOGITS
    olor
    0.15
    atab
    0.14
     baz
    0.14
    agit
    0.14
    INI
    0.14
    edor
    0.13
    count
    0.13
    oses
    0.13
     mor
    0.13
    positor
    0.13
    Act Density 0.007%

    No Known Activations