INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fld
    -0.08
     Knowledge
    -0.07
     speci
    -0.07
     Force
    -0.07
     source
    -0.07
     نسخه
    -0.07
     discharge
    -0.07
     TF
    -0.06
    	TR
    -0.06
     자유
    -0.06
    POSITIVE LOGITS
    asmine
    0.07
    continental
    0.07
    ancellor
    0.07
    PLAIN
    0.07
    lan
    0.07
    .Singleton
    0.07
    plain
    0.07
    bn
    0.07
    REDENTIAL
    0.07
    Left
    0.06
    Act Density 0.005%

    No Known Activations