INDEX
    Explanations

    phrases introducing characteristics or descriptions

    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.07
    2:0.07
    3:0.07
    4:0.08
    5:0.08
    6:0.09
    7:0.09
    8:0.08
    9:0.08
    10:0.08
    11:0.07
    Negative Logits
    uyomi
    -2.23
    ��
    -2.18
    keyes
    -1.92
    glers
    -1.89
    jri
    -1.89
    ursions
    -1.84
    ashtra
    -1.83
    vable
    -1.81
    ��
    -1.80
     sqor
    -1.80
    POSITIVE LOGITS
    Diamond
    1.93
    Cond
    1.76
     sexist
    1.75
     homophobic
    1.74
    Interstitial
    1.70
     QC
    1.69
     stating
    1.65
    Recommend
    1.65
     wherein
    1.64
     casting
    1.61
    Act Density 0.000%

    No Known Activations