INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ξεκ
    0.60
    0.58
     апре
    0.57
     адап
    0.56
     अवस्था
    0.55
     осе
    0.55
    一边
    0.54
    一层
    0.54
     идеи
    0.53
    ج
    0.53
    POSITIVE LOGITS
     answer
    1.09
     answering
    1.08
     menjawab
    1.06
     Answers
    1.04
    answer
    1.03
     beantwort
    1.03
     Answer
    1.00
    Answer
    0.97
     answers
    0.96
    0.95
    Act Density 0.159%

    No Known Activations