INDEX
    Explanations

    adjectives and conjunctions

    words and phrases related to contradictions and comparisons

    New Auto-Interp
    Negative Logits
    ftime
    -0.71
    amus
    -0.70
    izabeth
    -0.69
     herself
    -0.64
     exodus
    -0.61
    onement
    -0.61
    legate
    -0.59
    usk
    -0.58
    lon
    -0.58
    ffen
    -0.58
    POSITIVE LOGITS
    they
    1.65
     They
    1.51
     they
    1.47
    They
    1.41
     THEY
    1.39
     ones
    1.14
    their
    1.11
    These
    1.10
     These
    1.08
    Their
    1.04
    Act Density 0.716%

    No Known Activations