INDEX
Explanations
references to feelings of joy and happiness
New Auto-Interp
Negative Logits
:✨
-0.50
<bos>
-0.47
authenticate
-0.46
InjectMocks
-0.44
CanadaChoose
-0.44
pective
-0.43
EClass
-0.42
induce
-0.41
acute
-0.41
ंदीखरीदारी
-0.40
POSITIVE LOGITS
joy
1.04
Joy
0.60
joy
0.60
enjoyment
0.60
alegría
0.59
Joy
0.59
joys
0.57
gioia
0.57
alegria
0.56
happiness
0.54
Activations Density 0.207%