INDEX
Explanations
instances of the word 'sa' with varying degrees of activation
instances of the word "sa," indicating a focus on the concept of seasoning or flavoring
New Auto-Interp
Negative Logits
ãĥ¼ãĥĨãĤ£
-0.84
INGTON
-0.82
enegger
-0.76
hips
-0.74
itarian
-0.72
etheless
-0.69
ãĥ¯ãĥ³
-0.69
hyde
-0.68
å§«
-0.67
LIN
-0.66
POSITIVE LOGITS
pling
1.23
ven
1.08
plings
1.00
uth
1.00
arin
0.98
ucer
0.98
llan
0.97
ppy
0.96
uten
0.95
pp
0.93
Activations Density 0.006%