INDEX
Explanations
occurrences of the word "helpful" in the text
expressions related to helpfulness and assistance
New Auto-Interp
Negative Logits
metal
-0.75
gow
-0.75
buck
-0.75
thur
-0.74
Vision
-0.70
Rush
-0.69
Bam
-0.69
pool
-0.68
inction
-0.67
scar
-0.67
POSITIVE LOGITS
helpful
0.98
undermin
0.87
conduc
0.78
guiActiveUn
0.77
useful
0.76
aide
0.75
ãĤĭ
0.75
behavi
0.75
Helpful
0.74
ãĥ¼ãĥĨ
0.73
Activations Density 0.007%