INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Y
    0.37
    Tanh
    0.34
     erste
    0.33
    E
    0.32
    A
    0.32
    PI
    0.31
     خاصيه
    0.31
     Eigenschaften
    0.30
     ανε
    0.30
    0.30
    POSITIVE LOGITS
     at
    0.43
     helping
    0.39
     facilitating
    0.36
     promoting
    0.35
    0.35
    isted
    0.34
     enjoying
    0.33
     listening
    0.33
    ruiting
    0.33
    0.33
    Act Density 0.137%

    No Known Activations