INDEX
    Explanations

    expressions of surprise or emotional reactions

    New Auto-Interp
    Negative Logits
    alth
    -0.17
    ammen
    -0.17
    emode
    -0.16
    stellar
    -0.15
    ein
    -0.15
    opoulos
    -0.15
    eous
    -0.15
    ÑĨип
    -0.15
    ernen
    -0.15
    iveness
    -0.14
    POSITIVE LOGITS
    sen
    0.18
    atz
    0.15
    ridge
    0.14
    .Sdk
    0.14
    980
    0.14
    red
    0.14
    388
    0.14
    sm
    0.14
    ata
    0.14
    ļĮ
    0.13
    Act Density 0.017%

    No Known Activations