INDEX
Explanations
instances where someone is making a decision based on weighing potential risks
New Auto-Interp
Negative Logits
ļéĨĴ
-0.77
visory
-0.77
ãĥĥãĥĪ
-0.75
scribe
-0.74
emaker
-0.71
blem
-0.70
cedented
-0.70
vance
-0.70
ãĥ¥
-0.69
ãĤ¨ãĥ«
-0.69
POSITIVE LOGITS
huh
0.97
albeit
0.95
somew
0.92
eh
0.91
but
0.80
yeah
0.77
maybe
0.76
haha
0.76
nevertheless
0.73
Anyway
0.70
Activations Density 0.307%