INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     convention
    -0.08
     nose
    -0.08
     интеллекту
    -0.08
    cer
    -0.08
     intelectual
    -0.07
     hemisphere
    -0.07
     Nose
    -0.07
     intellect
    -0.07
    -0.07
     drilling
    -0.07
    POSITIVE LOGITS
     apology
    0.10
    公告
    0.10
     apolog
    0.09
     apologized
    0.09
     رم
    0.09
     Posted
    0.09
    Posted
    0.09
    Liebe
    0.09
     posted
    0.09
     emoji
    0.09
    Act Density 0.070%

    No Known Activations