INDEX
Explanations
expressions of satisfaction or contentment
expressions of gratitude or relief
New Auto-Interp
Negative Logits
helicop
-0.83
improve
-0.72
heat
-0.68
Improve
-0.65
effic
-0.64
$$$$
-0.62
cend
-0.61
eas
-0.61
haz
-0.60
artifacts
-0.60
POSITIVE LOGITS
glad
0.91
withstanding
0.72
imar
0.72
imaru
0.72
ness
0.72
dy
0.71
ा
0.69
terday
0.69
bringer
0.69
joy
0.69
Activations Density 0.016%