INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Foot
    -0.07
    	event
    -0.07
     Germans
    -0.07
     campground
    -0.07
    regnum
    -0.07
    registro
    -0.06
    Bind
    -0.06
    ,…↵↵
    -0.06
    fld
    -0.06
     Zw
    -0.06
    POSITIVE LOGITS
     hx
    0.06
    342
    0.06
    0.06
    ̂
    0.06
    816
    0.06
    نامه
    0.06
    ulent
    0.05
     archit
    0.05
    @endif
    0.05
    ческий
    0.05
    Act Density 0.007%

    No Known Activations