INDEX
    Explanations

    occurrences of the word "which."

    New Auto-Interp
    Negative Logits
    a
    -0.66
    e
    -0.60
    age
    -0.59
    ed
    -0.58
    i
    -0.57
    '
    -0.56
    P
    -0.55
    C
    -0.55
    ee
    -0.55
    cy
    -0.55
    POSITIVE LOGITS
    ]**
    0.87
     we
    0.86
    soever
    0.85
    تقاوى
    0.84
     means
    0.80
     they
    0.77
    ]--;
    0.76
    ]+"
    0.75
    RTLD
    0.75
    "]}
    0.73
    Act Density 0.167%

    No Known Activations