INDEX
Explanations
expressions of gratitude and excitement related to support and achievements
expressions of gratitude or excitement
New Auto-Interp
Negative Logits
prefers
-0.65
favoring
-0.62
ware
-0.62
beware
-0.62
respectively
-0.61
Bent
-0.58
ibaba
-0.57
Preferred
-0.56
ussed
-0.56
blamed
-0.55
POSITIVE LOGITS
!]
0.89
querade
0.69
agos
0.68
ASA
0.64
!!!!!!!!
0.61
imity
0.60
rats
0.59
opportunity
0.59
],"
0.58
ouncing
0.58
Activations Density 0.530%