INDEX
    Explanations

    logical connectors such as conjunctions and disjunctions

    New Auto-Interp
    Negative Logits
    ſelf
    -0.86
    ]='\
    -0.82
    '},
    
    -0.80
    ++
    
    -0.78
    />";
    -0.78
    ſelves
    -0.77
    '],
    
    -0.75
    "])
    
    -0.75
     pleaſure
    -0.73
    `;
    
    -0.73
    POSITIVE LOGITS
     I
    1.01
     you
    0.88
     there
    0.79
     we
    0.78
     they
    0.76
     stuff
    0.75
     everything
    0.72
     it
    0.71
    I
    0.71
     and
    0.69
    Act Density 0.274%

    No Known Activations