INDEX
Explanations
names with a common pattern, likely related to a specific person or topic
the presence of the substring "iy" within words
New Auto-Interp
Negative Logits
mint
-0.78
IBLE
-0.77
âĹ¼
-0.75
olkien
-0.68
inates
-0.66
ufact
-0.66
Else
-0.65
ãĥ¼ãĥ«
-0.64
wcs
-0.64
Hurricanes
-0.64
POSITIVE LOGITS
yah
1.23
azaki
0.94
yy
0.94
adh
0.92
ielding
0.91
oko
0.90
ya
0.89
atana
0.89
ota
0.89
oji
0.88
Activations Density 0.025%