INDEX
    Explanations

    recommendations and suggestions

    New Auto-Interp
    Negative Logits
    à¥ģà¤Ŀ
    -0.16
    quin
    -0.15
    how
    -0.15
    ÑĢеб
    -0.15
    utin
    -0.14
    about
    -0.13
    оваÑĤÑĮÑģÑı
    -0.13
    pping
    -0.13
    aldi
    -0.13
    ÑĢаÑģ
    -0.13
    POSITIVE LOGITS
     against
    0.25
     Against
    0.20
     you
    0.20
    against
    0.19
     strongly
    0.19
     everyone
    0.19
    Against
    0.18
    /request
    0.17
     avoiding
    0.17
     anyone
    0.17
    Act Density 0.055%

    No Known Activations