INDEX
    Explanations

    phrases indicating inevitability or strong potential for something to happen

    terms associated with inevitability and consequence

    New Auto-Interp
    Negative Logits
    activation
    -0.62
    gdala
    -0.60
     throats
    -0.59
    eeks
    -0.59
    talk
    -0.57
    ynthesis
    -0.57
    AMES
    -0.57
    chat
    -0.55
    papers
    -0.55
     helicop
    -0.55
    POSITIVE LOGITS
    ingly
    0.93
    iously
    0.87
    ibly
    0.87
    ously
    0.84
    uously
    0.83
    uably
    0.78
    ly
    0.76
    entimes
    0.75
    ensibly
    0.75
    ossal
    0.74
    Act Density 0.264%

    No Known Activations