INDEX
    Explanations

    code structure and paths

    New Auto-Interp
    Negative Logits
    ulent
    0.43
    indra
    0.42
    KX
    0.41
    uye
    0.41
    0
    0.40
    vo
    0.40
    kha
    0.40
    propion
    0.39
    0.39
    𝟬
    0.39
    POSITIVE LOGITS
     erkl
    0.43
    үз
    0.42
     mutlaka
    0.41
     razum
    0.41
     unidirectional
    0.40
    .}\
    0.38
     unbedingt
    0.37
     üt
    0.37
     filo
    0.37
     znacznie
    0.37
    Act Density 0.000%

    No Known Activations