INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Kro
    -0.08
    olph
    -0.08
    elapsed
    -0.08
     Orig
    -0.07
     Praise
    -0.07
     Stoff
    -0.07
     Lub
    -0.07
     Elimin
    -0.07
    ثق
    -0.07
     Miller
    -0.07
    POSITIVE LOGITS
     intérieure
    0.10
     peacefully
    0.08
    Enough
    0.08
    ствия
    0.08
     calm
    0.08
    ствием
    0.08
    0.08
     omp
    0.07
     অভ
    0.07
    0.07
    Act Density 0.011%

    No Known Activations