INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ンデ
    -0.07
    lat
    -0.07
     Kra
    -0.07
     Kullan
    -0.07
     ول
    -0.07
     fert
    -0.07
     Dol
    -0.06
    <Course
    -0.06
    pll
    -0.06
     вла
    -0.06
    POSITIVE LOGITS
     box
    0.13
     Box
    0.12
    Box
    0.12
    BOX
    0.11
     boxes
    0.10
    box
    0.10
    boxed
    0.10
    ibox
    0.09
     BOX
    0.09
    	box
    0.09
    Act Density 0.026%

    No Known Activations