INDEX
Explanations
specific unicode characters or character combinations
instances of a specific character or symbol within the text
New Auto-Interp
Negative Logits
anooga
-0.71
chrom
-0.68
alien
-0.68
readiness
-0.66
chang
-0.62
apes
-0.61
Blacks
-0.60
partnerships
-0.60
Bounty
-0.60
magnitude
-0.60
POSITIVE LOGITS
Ľ
1.35
ª
0.92
ł
0.89
¸
0.89
¾
0.88
׾
0.88
Ùħ
0.85
ľ
0.85
å§«
0.84
Ùĩ
0.84
Activations Density 0.005%