INDEX
    Explanations

    phrases indicating tendencies or behaviors

    New Auto-Interp
    Negative Logits
    adia
    -0.18
    oste
    -0.17
    aving
    -0.17
    esan
    -0.16
    icism
    -0.15
    opard
    -0.15
    å¥ı
    -0.15
    idable
    -0.14
    ιÏİν
    -0.14
    ourd
    -0.14
    POSITIVE LOGITS
    erness
    0.28
    ENCIES
    0.20
     tend
    0.19
    entially
    0.18
     tends
    0.18
     toward
    0.16
    encias
    0.16
    entious
    0.16
    reds
    0.16
    ży
    0.15
    Act Density 0.009%

    No Known Activations