INDEX
    Explanations

    phrases related to simplicity and straightforwardness

    New Auto-Interp
    Negative Logits
    ген
    -0.15
    lers
    -0.14
    ultimate
    -0.14
    ngr
    -0.14
    leri
    -0.14
    DDL
    -0.14
    _sink
    -0.14
    lide
    -0.14
    ub
    -0.13
    zel
    -0.13
    POSITIVE LOGITS
    tons
    0.32
    ton
    0.31
    /simple
    0.30
    xes
    0.26
    /plain
    0.23
    st
    0.22
    -minded
    0.21
    TON
    0.21
    mente
    0.19
    -simple
    0.19
    Act Density 0.044%

    No Known Activations