INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Saul
    -0.08
    	score
    -0.08
     CPP
    -0.07
     денег
    -0.07
    Behaviour
    -0.07
     DPR
    -0.07
     больш
    -0.07
     Kep
    -0.07
     Bha
    -0.07
     starch
    -0.07
    POSITIVE LOGITS
     curated
    0.18
    精选
    0.15
     curate
    0.12
     lọ
    0.12
     lựa
    0.12
     선정
    0.12
    -selected
    0.11
     selection
    0.11
     seleção
    0.10
     selectie
    0.10
    Act Density 0.039%

    No Known Activations