INDEX
    Explanations

    phrases indicating past actions or experiences

    phrases indicating strong opposition or conflict

    New Auto-Interp
    Negative Logits
    rous
    -0.62
    upper
    -0.62
    ogical
    -0.61
     optic
    -0.60
    rial
    -0.59
    LIB
    -0.58
     ransom
    -0.57
    orus
    -0.57
     theaters
    -0.57
    ery
    -0.56
    POSITIVE LOGITS
    ĪĴ
    0.91
     lately
    0.84
    recent
    0.82
     now
    0.78
     hasn
    0.71
    progress
    0.69
    alach
    0.69
    ierrez
    0.68
     recently
    0.66
    ateur
    0.66
    Act Density 0.724%

    No Known Activations