INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .
    -2.86
     is
    -2.50
    2
    -2.19
    "
    -2.11
    ↵↵
    -2.09
    -2.02
      
    -2.02
     $
    -1.96
     p
    -1.92
     .
    -1.85
    POSITIVE LOGITS
    我们
    2.31
     teater
    2.11
    genodigd
    2.09
    ープン
    2.09
    งิน
    2.08
     dieß
    2.05
     Ministero
    2.03
     dezelve
    2.03
    1.96
     zoude
    1.95
    Act Density 0.003%

    No Known Activations