INDEX
Explanations
helpful and useful feedback
New Auto-Interp
Negative Logits
usable
0.91
meaningful
0.86
useful
0.80
useful
0.76
meaningfully
0.75
usefulness
0.75
Useful
0.73
مفید
0.73
bermanfaat
0.73
有用
0.72
POSITIVE LOGITS
extremely
0.67
incredibly
0.64
immensely
0.60
Extremely
0.59
hel
0.59
insanely
0.56
estremamente
0.56
extremely
0.55
unbelievably
0.55
cực
0.51
Activations Density 0.022%