INDEX
    Explanations

    equals sign

    New Auto-Interp
    Negative Logits
     Hezbollah
    -0.09
     Scrum
    -0.09
     Tweet
    -0.09
     Timestamp
    -0.09
     volunteering
    -0.09
     فوائد
    -0.08
     Elke
    -0.08
     Tweets
    -0.08
    cussion
    -0.08
     excuse
    -0.08
    POSITIVE LOGITS
     distant
    0.09
    phäre
    0.09
    (-
    0.09
     (-
    0.09
    距离
    0.09
     relative
    0.08
     perturb
    0.08
     ±
    0.08
     displaced
    0.08
     distances
    0.08
    Act Density 0.007%

    No Known Activations