INDEX
Explanations
instances of the word "happy" or related forms and their variations
New Auto-Interp
Negative Logits
eer
-0.16
Boss
-0.15
Äįil
-0.14
ead
-0.14
ät
-0.14
.scalablytyped
-0.14
ichel
-0.14
ease
-0.14
uria
-0.14
ansson
-0.14
POSITIVE LOGITS
ily
0.30
iness
0.20
INESS
0.20
iest
0.20
ening
0.19
stance
0.17
happy
0.17
Happ
0.17
illy
0.17
yme
0.17
Activations Density 0.008%