INDEX
    Explanations

    phrases related to different kinds of reviews

    New Auto-Interp
    Negative Logits
    plex
    -0.70
    nown
    -0.70
    ata
    -0.69
    htar
    -0.68
    atum
    -0.67
    Sac
    -0.66
     forcibly
    -0.65
    stroke
    -0.63
    oshi
    -0.63
    ossus
    -0.63
    POSITIVE LOGITS
     reviews
    3.95
     Reviews
    2.67
     review
    2.46
     reviewers
    2.41
    review
    2.35
     reviewer
    2.14
    Review
    2.03
     Review
    2.02
     reviewed
    1.82
    reviewed
    1.80
    Act Density 0.012%

    No Known Activations