INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    heits
    -0.10
    Shrink
    -0.08
    למיד
    -0.08
     inflated
    -0.08
    Infl
    -0.08
    ned
    -0.08
    heids
    -0.08
    Sar
    -0.07
     Leadpages
    -0.07
     sabot
    -0.07
    POSITIVE LOGITS
     Govern
    0.08
     Movies
    0.08
    _movies
    0.08
     roman
    0.08
     calme
    0.07
     Copp
    0.07
     SPO
    0.07
     lenta
    0.07
    /videos
    0.07
     cinemat
    0.07
    Act Density 0.007%

    No Known Activations