INDEX
    Explanations

    the word "get" at strong activations

    instances of the phrase "I get" or similar expressions indicating understanding or realization

    New Auto-Interp
    Negative Logits
     Madness
    -0.62
     depiction
    -0.61
    ridge
    -0.61
    cius
    -0.60
     Palestin
    -0.59
     Archdemon
    -0.59
     enclosure
    -0.58
    iege
    -0.57
    enture
    -0.57
     annex
    -0.57
    POSITIVE LOGITS
     rid
    1.10
    tin
    1.02
    TING
    0.96
    aways
    0.85
     bored
    0.80
     lucky
    0.79
     acquainted
    0.76
    terson
    0.76
    DragonMagazine
    0.75
     tired
    0.73
    Act Density 0.115%

    No Known Activations