INDEX
    Explanations

    phrases indicating commitment and support

    New Auto-Interp
    Negative Logits
    yn
    -0.19
    kind
    -0.17
    IVITY
    -0.17
     quite
    -0.15
     kind
    -0.15
    brtc
    -0.15
    Ñħод
    -0.14
     Quite
    -0.14
    inement
    -0.14
    _kind
    -0.14
    POSITIVE LOGITS
     sound
    0.17
    exus
    0.17
     friction
    0.16
     robust
    0.16
    rob
    0.15
    commit
    0.15
     suite
    0.15
    tera
    0.15
    achts
    0.15
    997
    0.14
    Act Density 0.228%

    No Known Activations