INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     hingegen
    -0.09
     allerdings
    -0.08
     하지만
    -0.08
    beg
    -0.08
    fallen
    -0.08
     revanche
    -0.07
     möchtest
    -0.07
     invece
    -0.07
    Begin
    -0.07
     некоторых
    -0.07
    POSITIVE LOGITS
     comunque
    0.09
     overall
    0.09
     nonetheless
    0.08
     целом
    0.08
     سي
    0.07
    ης
    0.07
    还是
    0.07
     sigue
    0.07
     المهم
    0.07
     Anyway
    0.07
    Act Density 0.072%

    No Known Activations