INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     psychotic
    -0.06
     например
    -0.06
    Titan
    -0.06
     Wang
    -0.06
     Однако
    -0.06
    (['/
    -0.06
    Texture
    -0.06
    thesize
    -0.06
    )↵↵↵↵↵↵↵↵
    -0.06
    _step
    -0.06
    POSITIVE LOGITS
    ={}
    0.07
    (cube
    0.06
     Τ
    0.06
    Tele
    0.06
     readability
    0.06
     architecture
    0.06
     Witness
    0.06
    preci
    0.06
     Term
    0.06
     Spor
    0.06
    Act Density 0.005%

    No Known Activations