INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ?)
    -0.09
     circumstance
    -0.08
     hinweg
    -0.08
     тава
    -0.08
     vencer
    -0.08
     nincs
    -0.08
     (?
    -0.07
     gihugu
    -0.07
     Тур
    -0.07
     SYS
    -0.07
    POSITIVE LOGITS
     structured
    0.09
     concise
    0.08
     blueprint
    0.08
     bullet
    0.08
     mimi
    0.08
    structured
    0.08
    Structured
    0.07
     framed
    0.07
    具体
    0.07
    dien
    0.07
    Act Density 0.061%

    No Known Activations