INDEX
    Explanations

    questions and their corresponding structures in text

    New Auto-Interp
    Negative Logits
    ophon
    -0.19
    oders
    -0.15
     Sort
    -0.15
    Sort
    -0.15
    ÑĥÑħ
    -0.14
    nette
    -0.14
    èĤ¥
    -0.14
    ká
    -0.13
    Make
    -0.13
    lineno
    -0.13
    POSITIVE LOGITS
     how
    0.27
     what
    0.24
     How
    0.23
     whom
    0.22
     Does
    0.21
     Which
    0.21
     why
    0.20
     cui
    0.20
     who
    0.20
    how
    0.19
    Act Density 0.142%

    No Known Activations