INDEX
    Explanations

    phrases related to doing tasks, activities, or actions

    references to actions or behaviors being performed

    New Auto-Interp
    Negative Logits
     Dwell
    -0.72
     Flavoring
    -0.66
     sshd
    -0.63
    llah
    -0.60
    âĸĵ
    -0.59
     Returning
    -0.58
    allion
    -0.58
    éŃĶ
    -0.58
     ozone
    -0.58
    ixed
    -0.57
    POSITIVE LOGITS
     differently
    1.09
     wrong
    0.92
    wrong
    0.89
     unconsciously
    0.85
     cheaply
    0.82
     backwards
    0.82
     responsibly
    0.77
     offensively
    0.76
     chores
    0.76
     efficiently
    0.76
    Act Density 0.109%

    No Known Activations