INDEX
    Explanations

    the word "call" followed by a high positive activation

    instances of the word "call."

    New Auto-Interp
    Negative Logits
     istg
    -0.84
    bilt
    -0.81
     embr
    -0.70
    inth
    -0.69
    bourne
    -0.67
    olitics
    -0.66
    ynski
    -0.64
    cffff
    -0.61
    ipeg
    -0.61
     inh
    -0.60
    POSITIVE LOGITS
    backs
    1.01
    igraph
    0.98
    call
    0.95
    phas
    0.86
     bullshit
    0.83
    calling
    0.81
     911
    0.81
     Calling
    0.80
     bluff
    0.78
    oused
    0.76
    Act Density 0.053%

    No Known Activations