INDEX
    Explanations

    instances of specific items or groups within larger categories

    the word "which" in various contexts

    New Auto-Interp
    Negative Logits
    let
    -0.72
    nor
    -0.70
    politics
    -0.66
    hat
    -0.63
    LET
    -0.61
    fitting
    -0.60
     Binding
    -0.59
    quote
    -0.59
    -+
    -0.59
    dl
    -0.59
    POSITIVE LOGITS
     originated
    0.91
    akespeare
    0.82
     lasted
    0.81
     consisted
    0.79
     consists
    0.75
     specialize
    0.75
     are
    0.74
     resulted
    0.74
     survives
    0.73
     contributed
    0.73
    Act Density 0.023%

    No Known Activations