INDEX
    Explanations

    instances of the word "review" and its variations

    New Auto-Interp
    Negative Logits
    fol
    -0.18
    gow
    -0.16
    uraa
    -0.16
    bell
    -0.16
    arr
    -0.16
    hower
    -0.15
    geber
    -0.15
    abyrin
    -0.15
    373
    -0.14
    htub
    -0.14
    POSITIVE LOGITS
    able
    0.26
    ees
    0.21
    ers
    0.20
    ables
    0.18
    ee
    0.18
    /meta
    0.17
    uated
    0.17
    ABLE
    0.17
    iger
    0.17
    /comment
    0.16
    Act Density 0.027%

    No Known Activations