INDEX
    Explanations

    negative assertions or contradictions

    negations and negative expressions

    New Auto-Interp
    Negative Logits
     decency
    -0.62
    itiz
    -0.59
     selves
    -0.58
     camer
    -0.57
     civilisation
    -0.56
    ÙĴ
    -0.56
    ewitness
    -0.56
    velt
    -0.55
    etimes
    -0.54
    lycer
    -0.54
    POSITIVE LOGITS
     shy
    1.28
     exactly
    0.97
     necessarily
    0.96
     hesitated
    0.93
     amused
    0.88
     alone
    0.87
    icably
    0.86
    yet
    0.85
    orious
    0.85
     thrilled
    0.82
    Act Density 0.210%

    No Known Activations