INDEX
    Explanations

    phrases indicating a sense of difficulty or challenge

    New Auto-Interp
    Head Attr Weights
    0:0.01
    1:0.03
    2:0.14
    3:0.13
    4:0.01
    5:0.03
    6:0.05
    7:0.11
    8:0.12
    9:0.17
    10:0.05
    11:0.08
    Negative Logits
     overest
    -1.10
     introduced
    -1.08
     except
    -1.01
     introducing
    -1.00
     bust
    -0.99
     exagger
    -0.98
     teased
    -0.97
     tease
    -0.96
     frown
    -0.94
    imar
    -0.94
    POSITIVE LOGITS
    1.29
     裏�
    1.25
    ット
    1.23
    fficiency
    1.22
    1.20
    1.20
    oplan
    1.18
     whereabouts
    1.18
    appiness
    1.17
    otrop
    1.16
    Act Density 0.009%

    No Known Activations