INDEX
    Explanations

    expressions of emotion or reaction to situations

    New Auto-Interp
    Negative Logits
    "]);
    
    -0.95
    enumii
    -0.86
    ]));
    
    -0.85
    "];
    
    -0.85
    };*/
    -0.80
    }`);
    -0.80
    ]<<
    -0.79
    });*/
    -0.78
    })*/
    -0.78
    {}",
    -0.78
    POSITIVE LOGITS
    ,
    0.90
     Yep
    0.74
    Yep
    0.72
     hey
    0.70
    Nope
    0.69
    Hey
    0.69
     yeah
    0.69
     yep
    0.68
     Nope
    0.67
    Yup
    0.66
    Act Density 0.209%

    No Known Activations