INDEX
Explanations
derogatory terms and expressions related to poor behavior or attitudes
brat, spoilt
New Auto-Interp
Negative Logits
Portail
-0.58
!';
-0.55
%";
-0.53
]');
-0.53
'));
-0.52
)');
-0.52
Helios
-0.52
>';
-0.52
?}",
-0.51
__);
-0.50
POSITIVE LOGITS
Brat
2.44
Brat
2.28
brat
2.03
brat
1.66
Frat
0.75
rat
0.71
ratt
0.68
brata
0.66
Frat
0.66
bral
0.63
Activations Density 0.003%