INDEX
Explanations
themes related to the complexity of human experiences and emotions
New Auto-Interp
Negative Logits
“
-0.60
”,
-0.50
(“
-0.49
“,
-0.46
“[
-0.44
”.↵↵
-0.43
’,
-0.43
“â̦
-0.41
âĢŀ
-0.41
”.↵
-0.40
POSITIVE LOGITS
."
0.31
."'
0.26
.")
0.25
";
0.25
."↵↵↵
0.24
."↵↵↵↵
0.24
.");
0.24
.â̦
0.23
."↵
0.23
."+
0.21
Activations Density 0.293%