INDEX
    Explanations

    instances of asking and responding to questions

    New Auto-Interp
    Negative Logits
    леж
    -0.16
    uve
    -0.16
     trá»Ŀi
    -0.15
    aphael
    -0.14
    orget
    -0.14
    viso
    -0.14
    _UNS
    -0.14
    quate
    -0.13
    inish
    -0.13
    gae
    -0.13
    POSITIVE LOGITS
     questions
    0.34
     if
    0.34
     what
    0.34
     whether
    0.34
     why
    0.33
     about
    0.32
     permission
    0.30
     how
    0.28
     point
    0.27
    what
    0.24
    Act Density 0.050%

    No Known Activations