INDEX
Explanations
punctuation marks that express strong emotions or emphasis
New Auto-Interp
Negative Logits
.",
-0.81
\",
-0.79
]
-0.75
"=>"
-0.75
]").
-0.74
,}
-0.73
"],
-0.73
".
-0.73
*/,
-0.73
"]/
-0.73
POSITIVE LOGITS
?!
0.70
?!?!
0.66
?!?
0.63
rrrrr
0.62
??
0.59
environments
0.58
rrrrrr
0.58
!?
0.58
↵↵↵↵↵↵
0.58
للاسماء
0.58
Activations Density 0.108%