INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     zone
    -0.06
     aluminum
    -0.06
    ليم
    -0.06
     Surf
    -0.06
     Dean
    -0.06
     theatre
    -0.06
     travels
    -0.06
     dint
    -0.06
     refusal
    -0.06
    _sum
    -0.06
    POSITIVE LOGITS
    .LabelControl
    0.08
    Cheers
    0.07
    	pthread
    0.07
    	pub
    0.06
    κι
    0.06
    Marshal
    0.06
     ışı
    0.06
    exclusive
    0.06
     moderators
    0.06
     особенно
    0.06
    Act Density 0.002%

    No Known Activations