INDEX
    Explanations

    mentions of the term "junk" at different activation levels

    occurrences of the term "unk," suggesting a focus on unspecified or unknown entities or terms

    New Auto-Interp
    Negative Logits
    voy
    -0.76
    APH
    -0.64
     expressive
    -0.63
     flare
    -0.60
     hor
    -0.60
     unintended
    -0.59
     flared
    -0.58
    effective
    -0.58
    orsi
    -0.57
     latitude
    -0.56
    POSITIVE LOGITS
    buster
    1.12
    geon
    1.04
    irk
    0.97
    rat
    0.94
    etsu
    0.91
    busters
    0.91
    ernel
    0.90
    regate
    0.90
    lift
    0.89
    ett
    0.88
    Act Density 0.020%

    No Known Activations