INDEX
Explanations
letters and symbols related to non-English languages like Turkish
New Auto-Interp
Negative Logits
Perkins
-0.74
enegger
-0.72
WARD
-0.71
Binary
-0.67
Mandela
-0.66
Stard
-0.66
ifying
-0.65
mort
-0.65
mutual
-0.64
Silk
-0.64
POSITIVE LOGITS
ĥ
1.75
ķ
1.53
Ĵ
1.53
Ĺ
1.51
Ń
1.50
Ģ
1.50
ī
1.49
İ
1.49
Į
1.48
ħ
1.46
Activations Density 0.008%