INDEX
    Explanations

    adjectives related to characteristics or qualities

    references to various forms of written content or narratives

    New Auto-Interp
    Negative Logits
    xtap
    -0.67
     Seventh
    -0.65
     sugg
    -0.61
     Phill
    -0.60
     Branch
    -0.59
     Pru
    -0.58
    verty
    -0.57
    å¼
    -0.57
     529
    -0.57
     Tick
    -0.56
    POSITIVE LOGITS
     nonetheless
    1.10
     itself
    0.98
     anyway
    0.78
     ourselves
    0.77
     anyways
    0.76
     oneself
    0.76
     alike
    0.75
    nesses
    0.75
     ain
    0.74
     herself
    0.73
    Act Density 0.414%

    No Known Activations