INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     narrowly
    -0.10
     MDT
    -0.07
     kinn
    -0.07
    下降
    -0.07
     KH
    -0.07
     ART
    -0.07
    -0.07
    很好
    -0.07
     MOR
    -0.07
    最好
    -0.07
    POSITIVE LOGITS
     onwards
    0.12
     onward
    0.11
    .REG
    0.09
     વાયર
    0.08
     Educação
    0.08
     außer
    0.08
     הול
    0.07
     Sip
    0.07
    ві
    0.07
    גש
    0.07
    Act Density 0.060%

    No Known Activations