INDEX
    Explanations

    positive expressions referring to commitment, drive, and love

    New Auto-Interp
    Negative Logits
     kasa
    -1.50
     umo
    -1.48
     lele
    -1.45
     jaya
    -1.43
     levis
    -1.42
     hina
    -1.41
     mef
    -1.38
     lyon
    -1.38
     kug
    -1.37
     makro
    -1.36
    POSITIVE LOGITS
    <bos>
    0.83
     himself
    0.74
     always
    0.64
     proud
    0.63
     loves
    0.63
     feels
    0.62
     believes
    0.62
     enjoys
    0.61
     prefers
    0.59
     still
    0.58
    Act Density 0.345%

    No Known Activations