INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sekal
    -1.06
     zelfs
    -0.65
     וגם
    -0.60
    Trotz
    -0.59
    それでも
    -0.57
     Dennoch
    -0.56
    itself
    -0.55
     pourtant
    -0.55
     samot
    -0.55
     rağmen
    -0.54
    POSITIVE LOGITS
    TagMode
    0.78
    MessageTagHelper
    0.76
    /*
    0.76
    tdessen
    0.74
     مشين
    0.71
    PerformLayout
    0.69
     either
    0.66
     &___
    0.65
    either
    0.65
    tagHelperRunner
    0.64
    Act Density 0.007%

    No Known Activations