INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     markdown
    -0.08
     rese
    -0.07
     omp
    -0.07
     pride
    -0.07
    -------------↵
    -0.07
     &
    -0.07
     pneumonia
    -0.07
    oder
    -0.07
    --------------------------------------------------------------------------↵
    -0.07
    νώ
    -0.07
    POSITIVE LOGITS
     гг
    0.09
    0.09
     Viewed
    0.08
     hinweg
    0.08
    _strings
    0.08
     기업
    0.08
     Tiere
    0.08
     Redes
    0.08
     בר
    0.08
    were
    0.08
    Act Density 0.015%

    No Known Activations