INDEX
    Explanations

    mentions of the word "attacks" with a high activation value

    New Auto-Interp
    Negative Logits
    SOURCE
    -0.77
     pall
    -0.75
    SPONSORED
    -0.66
     Quarterly
    -0.65
     divest
    -0.64
     commencement
    -0.62
     curv
    -0.62
     planetary
    -0.62
    ©¶æ
    -0.61
     concurrent
    -0.59
    POSITIVE LOGITS
    ats
    1.35
    wana
    1.07
    terness
    1.03
    icket
    1.03
    abase
    1.02
    herer
    0.96
    heet
    0.96
    acus
    0.93
    htaking
    0.92
    chers
    0.92
    Act Density 0.012%

    No Known Activations