INDEX
    Explanations

    phrases related to moral values and ethical considerations

    New Auto-Interp
    Negative Logits
    ########.
    -0.84
    Попис
    -0.77
     navideños
    -0.70
     ModelRenderer
    -0.69
    TextAppearance
    -0.68
     calendriers
    -0.67
    featureID
    -0.65
     évi
    -0.65
     Bennett
    -0.65
    دانشنامهٔ
    -0.64
    POSITIVE LOGITS
            
    0.93
    <tr>
    0.86
        
    0.83
           
    0.81
       
    0.81
    </th>
    0.80
    		
    0.79
    enumi
    0.78
    </h2>
    0.77
         
    0.77
    Act Density 0.039%

    No Known Activations