INDEX
    Explanations

    sentences that provide positive evaluations or feedback

    Text followed by question/answer format

    New Auto-Interp
    Negative Logits
     AppCompatTheme
    -0.60
    født
    -0.54
     ("
    -0.51
    Rejo
    -0.51
    ennom
    -0.51
    randomUUID
    -0.50
     <--
    -0.50
    often
    -0.48
    gheny
    -0.48
     arguably
    -0.48
    POSITIVE LOGITS
    InjectAttribute
    0.61
    <eos>
    0.61
    ")));
    
    0.58
     rescheduled
    0.58
    =$?
    0.57
    basicConfig
    0.56
    rima
    0.56
     jätte
    0.55
    expandindo
    0.55
     staff
    0.54
    Act Density 0.026%

    No Known Activations