INDEX
Explanations
expressions of gratitude
New Auto-Interp
Negative Logits
Please
-0.37
Please
-0.36
quite
-0.36
unfortunately
-0.36
cute
-0.36
@@
-0.35
Pic
-0.35
Quite
-0.35
Geme
-0.35
interaction
-0.34
POSITIVE LOGITS
thankful
0.80
grateful
0.78
thankfully
0.67
Thankfully
0.67
Fortunately
0.66
rateful
0.63
ftagPool
0.62
append
0.61
Nutrient
0.61
glad
0.61
Activations Density 0.033%