INDEX
    Explanations

    adjectives describing behaviors or characteristics

    New Auto-Interp
    Negative Logits
    iverse
    -0.78
    undown
    -0.75
    artifacts
    -0.71
    isites
    -0.70
    orthy
    -0.69
    Ranked
    -0.67
    imester
    -0.66
     Sphere
    -0.66
    fields
    -0.66
     Gutenberg
    -0.66
    POSITIVE LOGITS
     optimism
    1.10
     caution
    1.06
     refusal
    1.01
     attitude
    1.00
     honesty
    1.00
     humility
    1.00
     demeanor
    0.99
     prag
    0.99
     indignation
    0.98
     arrogance
    0.97
    Act Density 3.839%

    No Known Activations