INDEX
Explanations
numeric or symbolic characters that appear at the end of words
sequences or symbols that signify a particular emphasis or pattern, likely related to coded or specialized language
New Auto-Interp
Negative Logits
Mub
-0.74
bda
-0.72
rake
-0.71
Downloadha
-0.68
disenfranch
-0.65
ukong
-0.63
bucks
-0.62
Peel
-0.61
warr
-0.60
levers
-0.60
POSITIVE LOGITS
ħ
1.11
Ĭ
0.99
¡
0.99
Į
0.98
İ
0.96
Û
0.92
ŀ
0.88
Ĩ
0.87
Ĥª
0.85
¾
0.85
Activations Density 0.020%