INDEX
    Explanations

    phrases related to consequences or methods

    phrases indicating significance or implications of various subjects

    New Auto-Interp
    Negative Logits
    reb
    -0.72
    edit
    -0.65
    greg
    -0.59
    alted
    -0.59
    older
    -0.59
    rex
    -0.58
     Bene
    -0.57
    rage
    -0.56
    ersen
    -0.56
    icent
    -0.56
    POSITIVE LOGITS
     means
    3.65
     Means
    2.62
     meant
    1.92
     mean
    1.81
     entails
    1.53
     signifies
    1.52
     implies
    1.50
     translates
    1.49
     equals
    1.32
     denotes
    1.24
    Act Density 0.027%

    No Known Activations