INDEX
Explanations
culinary-related words, potentially referring to dishes or ingredients
words with specific character sequences or patterns
New Auto-Interp
Negative Logits
trainers
-0.67
writ
-0.67
mathemat
-0.67
landmarks
-0.65
etsk
-0.63
runners
-0.63
eatures
-0.63
explan
-0.62
nour
-0.62
myster
-0.61
POSITIVE LOGITS
ï¸ı
1.22
vernment
0.97
lean
0.92
MQ
0.88
ï¸
0.87
log
0.85
ãĥĥãĥī
0.83
Ģ
0.82
ļ
0.81
leans
0.80
Activations Density 0.035%