INDEX
    Explanations

    phrases related to simplicity and plans

    New Auto-Interp
    Negative Logits
     depic
    -1.04
     intersper
    -1.00
     shenan
    -0.99
     disagre
    -0.97
     inappro
    -0.95
     gild
    -0.93
     accla
    -0.92
     encomp
    -0.92
     apprehen
    -0.90
     Shakspeare
    -0.89
    POSITIVE LOGITS
     simple
    1.17
    simple
    1.12
    Simple
    1.12
     Simple
    1.10
     simples
    1.02
    SIMPLE
    0.98
     SIMPLE
    0.93
     simplest
    0.84
     simplicity
    0.82
     simpler
    0.82
    Act Density 0.076%

    No Known Activations