INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     योगदान
    -0.08
     calibrated
    -0.08
    (coord
    -0.08
    ensitive
    -0.08
     फेर
    -0.08
     unaffected
    -0.08
     blauwe
    -0.07
     parasite
    -0.07
    āju
    -0.07
     platinum
    -0.07
    POSITIVE LOGITS
    /theme
    0.08
    .spring
    0.08
    /topic
    0.08
     vague
    0.08
    -hot
    0.08
     prompts
    0.08
    .todo
    0.08
     стих
    0.07
     dotyczą
    0.07
     запрос
    0.07
    Act Density 0.009%

    No Known Activations