INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     InputDecoration
    -0.81
    DoubleQuotes
    -0.76
     مشين
    -0.72
     ujednoznacz
    -0.66
    TagHelper
    -0.66
    yntaxException
    -0.64
    SharedCtor
    -0.62
    lippe
    -0.60
    /−
    -0.58
     myſelf
    -0.58
    POSITIVE LOGITS
     love
    0.82
    love
    0.71
     LOVE
    0.60
    Love
    0.59
     Love
    0.56
    LOVE
    0.55
     esist
    0.52
     life
    0.51
     любовь
    0.49
     liefde
    0.49
    Act Density 0.617%

    No Known Activations