INDEX
    Explanations

    the word "simple" accompanied by a high activation value

    the term "simple" in various contexts

    New Auto-Interp
    Negative Logits
     largeDownload
    -0.79
    raints
    -0.74
    vance
    -0.71
    ingle
    -0.70
    inburgh
    -0.69
    hips
    -0.69
    igham
    -0.68
     extensively
    -0.68
     vigorously
    -0.66
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    -0.66
    POSITIVE LOGITS
    tons
    1.20
    minded
    0.93
    wallet
    0.92
    ton
    0.86
    json
    0.86
     syrup
    0.81
     arithmetic
    0.80
     straightforward
    0.80
    coded
    0.79
    ified
    0.78
    Act Density 0.023%

    No Known Activations