INDEX
Explanations
phrases related to giving advice or instructions
New Auto-Interp
Negative Logits
ufficient
-0.66
cup
-0.60
ashington
-0.58
ighed
-0.56
luent
-0.56
ounge
-0.54
cious
-0.54
worthiness
-0.53
aneous
-0.52
Core
-0.51
POSITIVE LOGITS
rant
0.64
denial
0.64
reorgan
0.64
tir
0.62
looting
0.60
å§«
0.59
adventures
0.59
reckless
0.58
Ô
0.58
warr
0.56
Activations Density 21.874%