INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     благ
    -0.07
    -0.07
     erw
    -0.07
     allocation
    -0.07
    pragma
    -0.07
     bezpo
    -0.07
     grape
    -0.07
    .assertAlmostEqual
    -0.07
     totalPages
    -0.07
    -0.06
    POSITIVE LOGITS
     Kansas
    0.09
     pushed
    0.08
     Source
    0.07
    _layer
    0.07
    	new
    0.07
    mızda
    0.07
    0.07
    Fake
    0.07
     creek
    0.07
    kowski
    0.07
    Act Density 0.002%

    No Known Activations