INDEX
Explanations
references to helpfulness and advice
New Auto-Interp
Negative Logits
ETCH
-0.14
DoubleClick
-0.14
lis
-0.14
бол
-0.14
azor
-0.14
èī
-0.14
Incontri
-0.13
#End
-0.13
اÙĨÙĬا
-0.13
femin
-0.13
POSITIVE LOGITS
useful
0.77
Useful
0.67
helpful
0.61
handy
0.56
usefulness
0.55
полез
0.54
Helpful
0.52
valuable
0.47
Handy
0.46
hữu
0.43
Activations Density 0.276%