INDEX
    Explanations

    phrases indicating similarity or comparison

    comparisons expressing similarity

    New Auto-Interp
    Negative Logits
    alez
    -0.93
    ourse
    -0.87
    ESA
    -0.83
    Ö¼
    -0.83
    isexual
    -0.82
    inion
    -0.80
    onding
    -0.78
    alt
    -0.76
    Cause
    -0.75
    ipolar
    -0.74
    POSITIVE LOGITS
    lier
    1.09
    lihood
    1.02
    liest
    0.96
     Andromeda
    0.74
     crap
    0.72
     gib
    0.70
    flame
    0.69
     Carth
    0.66
     lifeless
    0.66
    liness
    0.66
    Act Density 0.033%

    No Known Activations