INDEX
    Explanations

    adjectives or descriptors

    New Auto-Interp
    Negative Logits
     this
    0.94
    这个
    0.92
    有着
    0.85
    "
    0.85
     ought
    0.84
     these
    0.84
    的时候
    0.83
     an
    0.83
     thing
    0.82
     twenty
    0.80
    POSITIVE LOGITS
     Yes
    1.17
    Yes
    1.05
     $+$
    1.02
     Usually
    1.00
     Mostly
    0.99
     +,
    0.98
     +
    0.96
     Mainly
    0.91
    Usually
    0.91
     Avg
    0.91
    Act Density 0.201%

    No Known Activations