INDEX
Explanations
references to food or drinks that are pleasant to consume, particularly those that are easily enjoyable
references to the term "pal" and its variants in various contexts, often relating to companionship or casual relationships
New Auto-Interp
Negative Logits
Annotations
-0.71
CDC
-0.68
Consent
-0.68
DERR
-0.67
ECH
-0.67
ãĥ¼ãĥĨãĤ£
-0.67
à¼
-0.65
REE
-0.65
VOL
-0.65
BOX
-0.64
POSITIVE LOGITS
atable
1.31
atial
1.10
estine
1.02
adin
0.99
mares
0.92
mop
0.87
pit
0.87
pal
0.85
itive
0.83
ours
0.82
Activations Density 0.012%