INDEX
Explanations
empathy and actionable advice
New Auto-Interp
Negative Logits
hopefully
0.47
Hopefully
0.46
möglicherweise
0.44
oznac
0.43
Hoping
0.42
implementations
0.40
presumably
0.39
potentially
0.39
こうした
0.39
additional
0.38
POSITIVE LOGITS
empathetic
0.55
empathy
0.55
transparent
0.54
relentlessly
0.54
humility
0.52
storytelling
0.50
transparent
0.49
demonstrable
0.49
empath
0.47
unapolog
0.47
Activations Density 0.060%