INDEX
    Explanations

    phrases expressing the difficulty or ease of a task or process

    New Auto-Interp
    Negative Logits
     frank
    -0.17
    Drug
    -0.14
     Drug
    -0.14
     عد
    -0.14
    iscrim
    -0.14
    thren
    -0.13
     justification
    -0.13
    bell
    -0.13
     Datum
    -0.13
     Bell
    -0.13
    POSITIVE LOGITS
     easier
    0.52
    asier
    0.36
     easiest
    0.32
     easy
    0.31
    easy
    0.29
    Easy
    0.28
     Easy
    0.27
     eas
    0.27
     fácil
    0.27
    æĺĵ
    0.26
    Act Density 0.062%

    No Known Activations