INDEX
Explanations
action verbs related to advice or recommendations
phrases that express recommendations or concerns
New Auto-Interp
Negative Logits
..............
-0.69
........
-0.69
................
-0.64
'.
-0.63
ryu
-0.59
anners
-0.59
.........
-0.58
kamp
-0.57
liv
-0.57
."
-0.57
POSITIVE LOGITS
ever
0.66
active
0.63
improvement
0.63
relates
0.62
involves
0.61
bothers
0.61
pursu
0.60
occurs
0.59
varies
0.59
%]
0.58
Activations Density 0.374%