INDEX
    Explanations

    instances of the word "or" with high activation values

    the word "or" and its usage in various contexts

    New Auto-Interp
    Negative Logits
    mostly
    -0.79
    then
    -0.65
    now
    -0.65
    tackle
    -0.64
    probably
    -0.59
    NOW
    -0.58
    Lots
    -0.58
    Probably
    -0.57
    Almost
    -0.56
    Here
    -0.56
    POSITIVE LOGITS
     anything
    1.54
     any
    1.24
     anywhere
    1.18
     anybody
    1.13
     slightest
    1.12
     anyone
    1.11
     even
    1.10
    chard
    1.10
     anymore
    1.09
     whatsoever
    1.09
    Act Density 0.093%

    No Known Activations