INDEX
    Explanations

    instances where a specific word "which" is used followed by the number 9 or 10

    New Auto-Interp
    Negative Logits
    politics
    -0.78
    let
    -0.75
    LET
    -0.68
    Calling
    -0.62
    CLOSE
    -0.62
     Calling
    -0.61
    UGH
    -0.60
     dividing
    -0.60
     RTX
    -0.59
    UG
    -0.59
    POSITIVE LOGITS
     originated
    0.85
     resulted
    0.85
     contributed
    0.82
     involve
    0.80
     lasted
    0.79
     derive
    0.78
     consisted
    0.78
     exceeded
    0.77
     specialize
    0.77
     yielded
    0.77
    Act Density 0.022%

    No Known Activations