INDEX
    Explanations

    phrases emphasizing awareness and understanding of information or concepts

    realize, aware, understand, recognize

    New Auto-Interp
    Negative Logits
     للاسماء
    -0.81
     パンチラ
    -0.69
    Autoritní
    -0.68
     beſch
    -0.68
     Вікі
    -0.68
     ſeinen
    -0.67
    <pad>
    -0.67
     Dieſe
    -0.67
    [@BOS@]
    -0.66
    <unused3>
    -0.66
    POSITIVE LOGITS
    Know
    0.49
     know
    0.47
     Know
    0.47
    know
    0.47
     understand
    0.46
     knows
    0.46
     remember
    0.45
     faptul
    0.43
    Remember
    0.42
     understands
    0.42
    Act Density 0.024%

    No Known Activations