INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     repo
    -0.07
    буд
    -0.07
     <!--[
    -0.06
    ρών
    -0.06
    (pattern
    -0.06
     soc
    -0.06
    	found
    -0.06
    <E
    -0.06
     offsetX
    -0.06
    POSITIVE LOGITS
     filmmakers
    0.07
     بـ
    0.07
     fantast
    0.07
    rega
    0.07
     harmful
    0.07
     improv
    0.06
    (DIS
    0.06
    pher
    0.06
    aldi
    0.06
     اساس
    0.06
    Act Density 0.006%

    No Known Activations