INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Italijani
    -0.52
    Décès
    -0.49
     تعدى
    -0.47
    fromnode
    -0.45
     Tatsache
    -0.44
    atán
    -0.44
     Überblick
    -0.43
    Veja
    -0.43
     Verhältnisse
    -0.43
     Kreise
    -0.42
    POSITIVE LOGITS
    ```
    2.14
     ```
    1.42
    ```
    
    1.07
    +```
    0.90
    ````
    0.89
    -```
    0.71
    LikeLike
    0.68
     `
    0.62
     "`
    0.60
     '`
    0.59
    Act Density 0.010%

    No Known Activations