INDEX
    Explanations

    instances of the word "review" and its variations

    New Auto-Interp
    Negative Logits
    htub
    -0.19
    arr
    -0.17
    fol
    -0.16
    unter
    -0.16
    gow
    -0.16
    abyrin
    -0.15
    cht
    -0.15
    ook
    -0.14
    quires
    -0.14
    geber
    -0.14
    POSITIVE LOGITS
    able
    0.25
    ees
    0.24
    ee
    0.22
    ers
    0.20
    ingly
    0.19
    ables
    0.18
    nger
    0.17
    çİĩ
    0.17
    avar
    0.17
    ABLE
    0.17
    Act Density 0.029%

    No Known Activations