INDEX
    Explanations

    the word "ga" appearing at different levels of activation

    repeated mentions of the word "ga"

    New Auto-Interp
    Negative Logits
    chard
    -0.79
    tz
    -0.79
    itaire
    -0.72
    icable
    -0.70
     Cosponsors
    -0.69
    ecause
    -0.69
    sbm
    -0.69
    itarian
    -0.68
    ality
    -0.67
    alities
    -0.67
    POSITIVE LOGITS
    terday
    0.98
    ignt
    0.72
    enaries
    0.71
     Nieto
    0.67
     indu
    0.64
    arde
    0.64
    andise
    0.61
     Varg
    0.61
    veyard
    0.61
    udo
    0.60
    Act Density 0.041%

    No Known Activations