INDEX
    Explanations

    phrases indicating understanding, agreement, or comprehension

    statements of comprehension or understanding

    New Auto-Interp
    Negative Logits
    rouse
    -0.76
    woods
    -0.72
    rock
    -0.70
    strip
    -0.69
    die
    -0.68
    metal
    -0.67
    onies
    -0.67
    Ranked
    -0.67
    pload
    -0.65
    endar
    -0.64
    POSITIVE LOGITS
     understands
    0.74
     Duc
    0.71
    ĺħ
    0.71
    ible
    0.71
     Understand
    0.70
     Stafford
    0.70
    ably
    0.69
     Languages
    0.68
    ances
    0.67
    iotic
    0.67
    Act Density 0.034%

    No Known Activations