INDEX
    Explanations

    phrases that express personal opinions or subjective preferences

    New Auto-Interp
    Negative Logits
     my
    -0.16
    ç§ģãģ®
    -0.15
    Remark
    -0.14
     Remark
    -0.14
     мо
    -0.14
    uname
    -0.14
    alar
    -0.14
    enate
    -0.14
     Incredible
    -0.14
    ãĥ³ãĥĩ
    -0.13
    POSITIVE LOGITS
     prefer
    0.19
     personally
    0.19
     hearing
    0.18
     whenever
    0.18
    prefer
    0.18
     anything
    0.17
     gim
    0.17
     preference
    0.17
     Favorite
    0.17
    pref
    0.16
    Act Density 0.175%

    No Known Activations