INDEX
Explanations
phrases related to advising or urging against certain actions
phrases related to abstaining or refraining from actions
New Auto-Interp
Negative Logits
ammy
-1.01
NetMessage
-0.75
immer
-0.70
rations
-0.69
ramid
-0.69
ovies
-0.67
neau
-0.67
odes
-0.67
oÄŁ
-0.66
onomy
-0.65
POSITIVE LOGITS
refrain
1.18
rences
0.90
abst
0.86
SourceFile
0.78
////////
0.69
ministic
0.69
stren
0.67
answering
0.67
acknow
0.66
swer
0.66
Activations Density 0.008%