INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     striped
    -0.09
     framed
    -0.08
     Hive
    -0.07
     EE
    -0.07
    kra
    -0.07
     Bulls
    -0.07
     Plush
    -0.07
     Jam
    -0.07
     controls
    -0.07
     crab
    -0.07
    POSITIVE LOGITS
     indefinitely
    0.11
    doctoral
    0.08
    vin
    0.08
    0.08
    wards
    0.08
    -fashioned
    0.08
     elic
    0.08
    	de
    0.08
     cred
    0.08
    serve
    0.08
    Act Density 0.025%

    No Known Activations