INDEX
    Explanations

    improves or correlates positively

    New Auto-Interp
    Negative Logits
     nữa
    0.93
     perfect
    0.79
    还需要
    0.79
     lets
    0.77
     perfetto
    0.76
     perfetta
    0.75
     perfecto
    0.73
    ちゃう
    0.72
     परफेक्ट
    0.71
    ্যাব
    0.71
    POSITIVE LOGITS
     improves
    1.79
     Improves
    1.58
     positively
    1.57
     improve
    1.55
     significantly
    1.50
    Improved
    1.43
    improve
    1.42
     correlated
    1.42
     correlates
    1.41
     improved
    1.39
    Act Density 0.739%

    No Known Activations