INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    =h
    -0.07
    _passwd
    -0.07
    +s
    -0.07
    iculty
    -0.07
    -distance
    -0.06
    (norm
    -0.06
     інтерес
    -0.06
     ihtiy
    -0.06
     Burr
    -0.06
    	G
    -0.06
    POSITIVE LOGITS
     examples
    0.10
     example
    0.07
    0.06
     enlightened
    0.06
     Supplement
    0.06
     например
    0.06
    がない
    0.06
    :"<<
    0.06
     REPRESENT
    0.06
    Examples
    0.06
    Act Density 0.028%

    No Known Activations