INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
     पढ़
    -0.07
     önemlidir
    -0.06
    _slope
    -0.06
    Cop
    -0.06
    ner
    -0.06
    -0.06
    ivec
    -0.06
     nb
    -0.06
    nv
    -0.06
    POSITIVE LOGITS
     Alive
    0.07
    ******/
    0.06
     rallying
    0.06
    	entry
    0.06
     spl
    0.06
    umlu
    0.05
     vice
    0.05
     Furious
    0.05
     Caf
    0.05
     Cartoon
    0.05
    Act Density 0.023%

    No Known Activations