INDEX
Explanations
verbs that indicate destruction or action towards a final result
words or phrases associated with emotional distress or negative experiences
New Auto-Interp
Negative Logits
"
-0.74
("-0.72
"@
-0.72
"_
-0.72
"...
-0.68
"#
-0.67
Accessory
-0.66
initions
-0.65
—"
-0.63
"[
-0.61
POSITIVE LOGITS
',"
2.62
,'
2.57
,'"
2.52
',
2.44
').
2.44
'."
2.44
']
2.44
'"
2.42
')
2.41
';
2.39
Activations Density 0.258%