INDEX
    Explanations

    phrases indicating problems or challenges in various contexts

    New Auto-Interp
    Head Attr Weights
    0:0.05
    1:0.02
    2:0.08
    3:0.45
    4:0.03
    5:0.07
    6:0.02
    7:0.05
    8:0.03
    9:0.01
    10:0.12
    11:0.02
    Negative Logits
     united
    -2.52
    Peace
    -2.51
    united
    -2.42
     Nobel
    -2.35
     Emmanuel
    -2.34
     Liberation
    -2.29
    -2.16
     Peace
    -2.13
    -2.13
    Together
    -2.13
    POSITIVE LOGITS
     cumbers
    4.04
     annoying
    3.72
     sloppy
    3.70
     distractions
    3.65
     clutter
    3.62
     glitches
    3.52
     awkward
    3.51
     tedious
    3.50
     delays
    3.48
     confusion
    3.43
    Act Density 1.525%

    No Known Activations