INDEX
    Explanations

    phrases that pose questions or express doubts

    New Auto-Interp
    Negative Logits
    quer
    -0.68
    hack
    -0.64
    gain
    -0.60
    quin
    -0.59
    ument
    -0.59
    query
    -0.57
    athon
    -0.56
    brid
    -0.55
     Radius
    -0.54
    iy
    -0.54
    POSITIVE LOGITS
     they
    0.85
     THEY
    0.79
    atta
    0.68
    Filename
    0.68
     "[
    0.63
     he
    0.62
     she
    0.61
    they
    0.60
    ="/
    0.60
    ãĢİ
    0.60
    Act Density 0.187%

    No Known Activations