INDEX
    Explanations

    phrases that indicate obstacles or issues in discussions about relationships and societal challenges

    New Auto-Interp
    Negative Logits
    exceptions
    -0.17
    arel
    -0.16
    ubbo
    -0.15
    atre
    -0.15
     errors
    -0.15
     steps
    -0.14
    ascade
    -0.14
    YLES
    -0.14
    ån
    -0.14
    	errors
    -0.14
    POSITIVE LOGITS
     sticking
    0.28
     factor
    0.26
     concern
    0.26
     Achilles
    0.25
     issue
    0.23
     major
    0.23
     hind
    0.23
    factor
    0.22
     th
    0.21
     barrier
    0.21
    Act Density 0.126%

    No Known Activations