INDEX
Explanations
phrases related to poor quality or performance
New Auto-Interp
Negative Logits
ategory
-0.84
ubi
-0.81
SPONSORED
-0.80
Pi
-0.77
Ru
-0.68
irs
-0.67
":[
-0.66
{:-0.66
pta
-0.66
CU
-0.66
POSITIVE LOGITS
quality
1.15
hygiene
1.11
judgement
0.98
luck
0.96
manners
0.95
imitation
0.94
digestion
0.91
performers
0.91
judgment
0.90
quality
0.90
Activations Density 0.064%