INDEX
    Explanations

    phrases related to expectations and evaluations of people's actions or situations

    New Auto-Interp
    Negative Logits
    raisal
    -0.15
    ¬
    -0.15
    :↵↵
    -0.14
    isible
    -0.14
    ÄIJT
    -0.13
     :↵↵
    -0.13
    ’ÑĶ
    -0.13
    uppe
    -0.12
    -%
    -0.12
    cpp
    -0.12
    POSITIVE LOGITS
    gether
    0.21
    bidden
    0.20
    oretical
    0.19
    bsites
    0.19
    jourd
    0.17
    tempts
    0.17
    itionally
    0.17
    arLayout
    0.17
    nger
    0.16
    theless
    0.16
    Act Density 0.848%

    No Known Activations