INDEX
    Explanations

    phrases related to changes or increments

    phrases related to increases and decreases in various metrics or rates

    New Auto-Interp
    Negative Logits
    eren
    -0.65
    rief
    -0.65
    orno
    -0.63
    pb
    -0.61
     Fun
    -0.60
    ija
    -0.59
    love
    -0.59
    famous
    -0.58
    view
    -0.58
    OME
    -0.57
    POSITIVE LOGITS
     increases
    3.28
     decreases
    2.83
     Increases
    2.17
    Increases
    2.05
     increase
    2.05
     rises
    1.95
     reductions
    1.87
     Increase
    1.83
     boosts
    1.83
    incre
    1.81
    Act Density 0.017%

    No Known Activations