INDEX
    Explanations

    phrases prompting the reader to pay attention or consider something

    New Auto-Interp
    Negative Logits
    lees
    -0.87
    oing
    -0.80
    sbm
    -0.79
    soever
    -0.74
    iar
    -0.74
    =~=~
    -0.74
    oided
    -0.70
    iere
    -0.68
    raged
    -0.68
    ittle
    -0.66
    POSITIVE LOGITS
     WHY
    1.11
     something
    1.08
     what
    1.05
     why
    1.04
     how
    0.99
     causation
    0.97
     ourselves
    0.96
     basics
    0.94
     some
    0.91
     facts
    0.90
    Act Density 0.218%

    No Known Activations