INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     gug
    -0.08
    -growth
    -0.08
     Growth
    -0.07
    Growth
    -0.07
     arches
    -0.07
    _growth
    -0.07
    graf
    -0.07
     begrü
    -0.07
    President
    -0.07
     impos
    -0.07
    POSITIVE LOGITS
    iede
    0.09
     안전
    0.09
     sicheren
    0.08
     وعدم
    0.08
     divulgação
    0.08
    하십시오
    0.08
     divulg
    0.08
     ذخ
    0.08
     저장
    0.08
    0.08
    Act Density 0.011%

    No Known Activations