INDEX
    Explanations

    references to specific brands or names within the text

    New Auto-Interp
    Negative Logits
    gether
    -0.23
    bidden
    -0.22
    etheless
    -0.20
    adays
    -0.19
    tempts
    -0.19
    achusetts
    -0.18
    vasion
    -0.17
    quarters
    -0.17
    theless
    -0.16
    gomery
    -0.16
    POSITIVE LOGITS
    orem
    0.17
    chyb
    0.16
    infeld
    0.15
    imiter
    0.15
    -instagram
    0.14
    ="__
    0.14
    iland
    0.14
    bie
    0.14
    dish
    0.14
    alic
    0.13
    Act Density 0.193%

    No Known Activations