INDEX
Explanations
a variety of symbols or special characters
New Auto-Interp
Negative Logits
board
-0.95
enegger
-0.72
ographically
-0.71
fabrication
-0.69
ulators
-0.68
ierrez
-0.68
Fargo
-0.67
concede
-0.67
cautiously
-0.67
charm
-0.67
POSITIVE LOGITS
¹
1.77
ª
1.62
¨
1.60
£
1.56
Į
1.52
Ń
1.52
¢
1.52
¥
1.51
±
1.50
º
1.49
Activations Density 0.002%