INDEX
Explanations
instances of the word "smile" and other expressions of happiness or friendliness
New Auto-Interp
Negative Logits
jsdelivr
-0.61
ab
-0.60
near
-0.58
madu
-0.56
pod
-0.55
N
-0.55
Ed
-0.55
뷔
-0.54
In
-0.54
Pod
-0.54
POSITIVE LOGITS
smile
3.11
smiles
2.72
Smile
2.64
Smile
2.46
smile
2.39
smiling
2.33
smiled
2.26
Smiles
2.18
Smiling
2.06
smiles
1.97
Activations Density 0.049%