INDEX
    Explanations

    phrases indicating contrast or difference

    phrases that indicate differences or variations between subjects

    New Auto-Interp
    Negative Logits
    icide
    -0.71
    VICE
    -0.67
    indust
    -0.66
    record
    -0.64
    phies
    -0.64
    vice
    -0.63
    wind
    -0.63
    ongyang
    -0.62
    stop
    -0.62
    ocious
    -0.62
    POSITIVE LOGITS
    Different
    0.78
     differing
    0.77
    ":"/
    0.77
     Differences
    0.77
     personalities
    0.71
     depending
    0.70
     timelines
    0.70
     Original
    0.68
     Same
    0.67
    Race
    0.67
    Act Density 0.583%

    No Known Activations