INDEX
    Explanations

    instances of feedback and evaluation processes

    New Auto-Interp
    Negative Logits
     asking
    -0.18
    hausen
    -0.17
    QUERY
    -0.16
    ldb
    -0.16
    ufe
    -0.15
    κε
    -0.15
    orian
    -0.15
    (([
    -0.15
     escorte
    -0.14
    anian
    -0.14
    POSITIVE LOGITS
     answering
    0.20
     input
    0.19
     answer
    0.17
    input
    0.17
    Input
    0.17
    -input
    0.16
     receive
    0.16
     participate
    0.16
     completing
    0.16
     receives
    0.16
    Act Density 0.155%

    No Known Activations