INDEX
    Explanations

    references to decisions or actions being made

    phrases that involve various forms of the word "move" indicating actions or changes

    New Auto-Interp
    Negative Logits
    omial
    -0.74
     sqor
    -0.72
    english
    -0.68
    etheless
    -0.66
     Koran
    -0.65
    ordon
    -0.62
    aples
    -0.62
    sung
    -0.61
    errors
    -0.61
    iciency
    -0.59
    POSITIVE LOGITS
     toward
    0.86
     towards
    0.85
    able
    0.83
    ments
    0.81
    backs
    0.80
    rers
    0.79
    over
    0.77
    wright
    0.75
    ler
    0.74
    llan
    0.73
    Act Density 0.035%

    No Known Activations