INDEX
    Explanations

    words related to safety and precision

    terms related to quality, safety, and fairness in various contexts

    New Auto-Interp
    Negative Logits
    ocene
    -0.78
    ften
    -0.78
     athlet
    -0.72
    alian
    -0.69
    ittee
    -0.69
    cone
    -0.68
    hell
    -0.67
    utra
    -0.66
    ourke
    -0.65
    anwhile
    -0.65
    POSITIVE LOGITS
    ness
    0.93
     amounts
    0.86
    nesses
    0.85
     doses
    0.79
     circumstances
    0.77
     explanations
    0.76
     medical
    0.76
     levels
    0.75
     quality
    0.75
     situations
    0.75
    Act Density 0.507%

    No Known Activations