INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     indirectly
    0.41
     neglects
    0.40
     indirect
    0.40
    プレゼント
    0.40
    보니
    0.40
     प्रत्यक्ष
    0.39
    0.39
    </caption>
    0.38
     overlooks
    0.38
     olvides
    0.37
    POSITIVE LOGITS
     gracefully
    0.87
     calmly
    0.71
     politely
    0.70
     graceful
    0.61
     gently
    0.56
     regroup
    0.56
     వెంటనే
    0.56
     promptly
    0.53
     deftly
    0.53
    迅速
    0.52
    Act Density 0.014%

    No Known Activations