INDEX
Explanations
mentions of the name "Yo" at varying strengths of activation
mentions of a specific person or character, particularly associated with the name "Yo."
New Auto-Interp
Negative Logits
ioned
-0.86
lain
-0.84
ãĥ¼ãĥĨ
-0.73
limited
-0.71
riott
-0.70
eele
-0.70
lessly
-0.68
ãĥ¼ãĥĨãĤ£
-0.68
è¦ļéĨĴ
-0.68
ãĥģ
-0.67
POSITIVE LOGITS
Yo
1.27
Yo
0.98
eli
0.97
ichi
0.83
Da
0.76
Mama
0.75
bean
0.75
yo
0.74
gee
0.73
Huh
0.73
Activations Density 0.003%