INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     بعدها
    0.62
    这点
    0.55
     dirs
    0.54
     أيضا
    0.53
     uniqu
    0.53
     wept
    0.53
     glimps
    0.52
     quizá
    0.52
     впоследствии
    0.52
    0.52
    POSITIVE LOGITS
     Answer
    0.78
     answer
    0.75
     😊
    0.74
     답변
    0.73
    Answer
    0.71
    Explanation
    0.65
    swering
    0.64
    ANSWER
    0.63
     Explanation
    0.61
     :)
    0.61
    Act Density 6.227%

    No Known Activations