INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     emitted
    -0.08
    rgb
    -0.08
     emoji
    -0.08
    	size
    -0.07
    بدأ
    -0.07
    -0.07
    -0.07
    Barry
    -0.07
    artz
    -0.07
    -0.07
    POSITIVE LOGITS
    liness
    0.10
    pari
    0.08
     unfair
    0.08
     unrealistic
    0.08
     disrespect
    0.08
     unnatural
    0.08
     unreasonable
    0.08
     Islanders
    0.08
     jika
    0.08
    atisfactory
    0.07
    Act Density 0.033%

    No Known Activations