INDEX
    Explanations

    phrases associated with recommendations and evaluations

    New Auto-Interp
    Negative Logits
     للمعارف
    -0.58
     سكانية
    -0.57
    ValueStyle
    -0.54
     发表于
    -0.53
    devším
    -0.52
     utafitiHapana
    -0.51
    سطس
    -0.49
    ؤلاء
    -0.48
    Пока
    -0.47
     Toujours
    -0.47
    POSITIVE LOGITS
     Reasons
    1.12
     Best
    1.07
     Top
    1.06
     Ways
    1.03
    Best
    1.01
    Reasons
    0.99
     Types
    0.99
    Top
    0.98
     top
    0.95
     reasons
    0.92
    Act Density 0.252%

    No Known Activations