INDEX
Explanations
words signaling a positive outcome or solution
expressions of relief or gratitude
New Auto-Interp
Negative Logits
gate
-0.71
pled
-0.67
QL
-0.67
76561
-0.67
angles
-0.62
agg
-0.62
kindred
-0.61
dq
-0.61
omin
-0.61
utenberg
-0.61
POSITIVE LOGITS
fortunately
0.93
nown
0.83
terday
0.82
luckily
0.77
enough
0.71
thankfully
0.71
Fortunately
0.70
Thankfully
0.70
ESA
0.69
phans
0.65
Activations Density 0.014%