INDEX
    Explanations

    explicit mentions of personal experiences and opinions

    New Auto-Interp
    Negative Logits
     DialogInterface
    -0.51
    сми
    -0.46
    findpost
    -0.46
    おお
    -0.46
    IsContent
    -0.43
    好在
    -0.43
    зин
    -0.42
    mybatisplus
    -0.42
     للمعارف
    -0.42
     كومونز
    -0.42
    POSITIVE LOGITS
     simply
    3.49
     just
    3.49
    just
    3.17
    simply
    3.11
     simplemente
    3.03
     merely
    2.87
     semplicemente
    2.79
    Just
    2.76
     просто
    2.73
    Simply
    2.72
    Act Density 1.009%

    No Known Activations