INDEX
Explanations
motivational themes and personal growth narratives
New Auto-Interp
Negative Logits
(?)
-0.26
???
-0.26
(?
-0.25
?????
-0.25
??
-0.20
.'</
-0.19
.”↵↵
-0.18
."↵↵↵↵
-0.18
."↵↵
-0.17
."↵↵↵
-0.17
POSITIVE LOGITS
?↵
0.72
?
0.65
ï¼Ł↵
0.56
?↵↵
0.55
?"↵
0.54
?č↵
0.50
?)↵
0.50
?”
0.49
ØŁ↵
0.48
?↵↵↵↵
0.47
Activations Density 2.013%