INDEX
    Explanations

    instances where someone is making a decision based on weighing potential risks

    New Auto-Interp
    Negative Logits
    ļéĨĴ
    -0.77
    visory
    -0.77
    ãĥĥãĥĪ
    -0.75
    scribe
    -0.74
    emaker
    -0.71
    blem
    -0.70
    cedented
    -0.70
    vance
    -0.70
    ãĥ¥
    -0.69
    ãĤ¨ãĥ«
    -0.69
    POSITIVE LOGITS
     huh
    0.97
     albeit
    0.95
     somew
    0.92
     eh
    0.91
     but
    0.80
     yeah
    0.77
     maybe
    0.76
     haha
    0.76
     nevertheless
    0.73
     Anyway
    0.70
    Act Density 0.307%

    No Known Activations