INDEX
    Explanations

    words or phrases indicating obstacles, challenges, or difficulties

    phrases that indicate difficulty or obstacles

    New Auto-Interp
    Negative Logits
    Stars
    -0.72
    Originally
    -0.68
    !/
    -0.67
    Introduced
    -0.65
     Variant
    -0.63
    rika
    -0.62
    roma
    -0.62
    kind
    -0.62
    rak
    -0.62
    mology
    -0.59
    POSITIVE LOGITS
    enged
    0.80
    aneously
    0.75
     prey
    0.74
    ible
    0.73
    chain
    0.71
    ioned
    0.68
    anced
    0.68
    untary
    0.67
    forced
    0.66
     enforce
    0.65
    Act Density 0.055%

    No Known Activations