INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     love
    -1.45
     LOVE
    -1.01
     Love
    -0.95
    love
    -0.90
    Love
    -0.84
    LOVE
    -0.81
     الحب
    -0.72
     liefde
    -0.67
     want
    -0.66
     luv
    -0.66
    POSITIVE LOGITS
     to
    0.78
    about
    0.66
    afficheront
    0.64
     rubio
    0.60
    oa̍t
    0.60
    Brainz
    0.59
     ویکی‌پدیا
    0.58
     mergeFrom
    0.56
    ings
    0.56
     adherent
    0.56
    Act Density 0.018%

    No Known Activations