INDEX
Explanations
non-English characters, potentially in a specific language or encoding
New Auto-Interp
Negative Logits
illas
-0.69
illa
-0.68
aic
-0.67
eers
-0.67
logger
-0.66
olves
-0.64
stadt
-0.63
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.61
chau
-0.59
levers
-0.58
POSITIVE LOGITS
ħ
0.94
¾
0.85
¼
0.84
Į
0.84
ãģį
0.83
ttp
0.81
İ
0.78
α
0.76
Ĩ
0.76
ŀ
0.75
Activations Density 9.747%