INDEX
Explanations
words related to providing helpful suggestions or feedback
terms related to constructive actions and feedback, often in contrast to abusive or negative behavior
New Auto-Interp
Negative Logits
orph
-0.90
atch
-0.80
urg
-0.76
ared
-0.75
aver
-0.72
gars
-0.72
alm
-0.69
olog
-0.69
paralle
-0.69
andra
-0.68
POSITIVE LOGITS
constructive
1.15
criticism
0.85
feedback
0.83
-+-+
0.77
entreprene
0.76
redes
0.75
repr
0.74
sunlight
0.71
daylight
0.71
criticisms
0.70
Activations Density 0.011%