INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    average
    -0.07
     kvinna
    -0.07
    policy
    -0.07
     forearm
    -0.07
    	test
    -0.06
     hammer
    -0.06
     پدر
    -0.06
     cabinet
    -0.06
    Number
    -0.06
    ()[
    -0.06
    POSITIVE LOGITS
     Joy
    0.08
    ounces
    0.07
    PURE
    0.07
    OTS
    0.07
    イト
    0.07
    joy
    0.07
    :ss
    0.06
     joy
    0.06
    Bright
    0.06
    +S
    0.06
    Act Density 0.010%

    No Known Activations