INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     neuve
    -1.05
     mirar
    -0.98
     bzw
    -0.93
     передний
    -0.91
     Symptome
    -0.91
    GUILayout
    -0.91
     THAT
    -0.91
     intellectuelle
    -0.90
    をします
    -0.90
    vább
    -0.89
    POSITIVE LOGITS
     commer
    1.16
     künftig
    1.01
     genügend
    0.98
     dasselbe
    0.94
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.91
     byť
    0.90
     dieselbe
    0.90
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.90
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.90
     yaka
    0.89
    Act Density 0.090%

    No Known Activations