INDEX
Explanations
numerical values or patterns mixed with special characters and letters
special characters or non-standard symbols in the text
New Auto-Interp
Negative Logits
enegger
-0.78
inement
-0.72
bombshell
-0.71
advis
-0.71
agents
-0.68
compr
-0.68
raints
-0.66
umers
-0.66
relevance
-0.65
isance
-0.65
POSITIVE LOGITS
ħ
1.14
ĩ
0.98
оÐ
0.93
Į
0.92
Ĩ
0.89
İ
0.88
âĸĦ
0.84
ª
0.83
Ī
0.81
Ħ
0.79
Activations Density 0.007%