INDEX
    Explanations

    phrases that indicate conditionality or absence

    New Auto-Interp
    Negative Logits
    ãĥī
    -0.77
    ãĥĺ
    -0.73
    quer
    -0.72
    mon
    -0.71
    ery
    -0.69
    mers
    -0.68
    late
    -0.67
    rolled
    -0.66
    oka
    -0.66
    cow
    -0.66
    POSITIVE LOGITS
     risking
    0.96
     knowing
    0.91
     encountering
    0.88
     sacrificing
    0.86
     mentioning
    0.85
     noticing
    0.82
     compromising
    0.81
     recourse
    0.79
     realizing
    0.78
     violating
    0.76
    Act Density 0.020%

    No Known Activations