INDEX
    Explanations

    the word "ton" with high activation values

    the repeated mention of the word "ton."

    New Auto-Interp
    Negative Logits
     Phant
    -0.71
    Craft
    -0.70
     Danger
    -0.67
     Cust
    -0.67
    ELD
    -0.67
    Constructed
    -0.67
     Clinic
    -0.63
    Journal
    -0.61
     Forever
    -0.61
    pring
    -0.60
    POSITIVE LOGITS
    neau
    1.05
    ne
    0.92
    arg
    0.90
    Ton
    0.89
    aton
    0.89
    ights
    0.87
    umen
    0.87
    eful
    0.86
     earthqu
    0.86
    icum
    0.85
    Act Density 0.018%

    No Known Activations