INDEX
    Explanations

    expressions conveying certainty or confidence in understanding or doing something

    first-person references to knowledge and self-awareness

    New Auto-Interp
    Negative Logits
     Alive
    -0.66
    âĢ¢âĢ¢
    -0.62
     Legends
    -0.59
     nowhere
    -0.58
     Chron
    -0.58
    mony
    -0.58
     awareness
    -0.56
     Dram
    -0.56
     Fortune
    -0.55
     guiActive
    -0.55
    POSITIVE LOGITS
    're
    0.83
    mean
    0.81
     mean
    0.75
     fuss
    0.74
    doing
    0.74
     Mean
    0.73
    /$
    0.72
     meant
    0.71
     entail
    0.70
    need
    0.69
    Act Density 0.084%

    No Known Activations