INDEX
    Explanations

    expressions indicating measurement, evaluation, or comparison

    New Auto-Interp
    Negative Logits
    idor
    -0.15
     æ¾
    -0.15
    ayo
    -0.15
    FER
    -0.14
     sophistication
    -0.14
    /xhtml
    -0.14
    ãĥ¼ãĥŀ
    -0.13
     southern
    -0.13
    agem
    -0.13
     sophisticated
    -0.13
    POSITIVE LOGITS
     hard
    0.45
     difficult
    0.40
    hard
    0.37
     harder
    0.36
     Hard
    0.36
     hardest
    0.35
    Hard
    0.34
     HARD
    0.34
    -hard
    0.33
     difficulty
    0.31
    Act Density 0.015%

    No Known Activations