INDEX
    Explanations

    references to individuals and their actions or characteristics within a narrative context

    New Auto-Interp
    Negative Logits
    uss
    -0.16
     Cv
    -0.16
    olle
    -0.15
     Rue
    -0.15
    bn
    -0.15
    esco
    -0.15
    tiv
    -0.14
    ptic
    -0.14
     Jam
    -0.14
     linear
    -0.14
    POSITIVE LOGITS
    é³
    0.16
    atform
    0.16
    isson
    0.15
    ế
    0.15
    ãĥ³ãĤ¸
    0.14
    VERRIDE
    0.14
    /cop
    0.14
    iji
    0.14
    çİī
    0.14
    adders
    0.13
    Act Density 0.134%

    No Known Activations