INDEX
Explanations
phrases related to empathy and support for others
instances of the word "and," indicating a focus on conjunctions or connections between ideas
New Auto-Interp
Negative Logits
Sat
-0.76
agen
-0.71
itarian
-0.68
lav
-0.67
ibi
-0.67
Sov
-0.67
ignant
-0.66
atan
-0.66
ASC
-0.65
hov
-0.64
POSITIVE LOGITS
hopefully
1.05
romeda
0.96
thereby
0.95
thus
0.95
secondly
0.89
lifestyles
0.87
consequently
0.87
enjoy
0.86
interacts
0.85
hence
0.84
Activations Density 0.422%