INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Thinking
    -1.63
     thinking
    -1.54
    Thinking
    -1.52
    thinking
    -1.45
    Think
    -1.27
     THINK
    -1.25
    THINK
    -1.25
     Think
    -1.23
    think
    -1.15
     thinker
    -0.80
    POSITIVE LOGITS
    osoba
    0.69
    Искәрмәләр
    0.66
    MessageTagHelper
    0.66
    ItemBackground
    0.59
     насељу
    0.59
     CWE
    0.57
    مصادر
    0.56
    AsUp
    0.55
    of
    0.55
     Italijani
    0.54
    Act Density 0.112%

    No Known Activations