INDEX
Explanations
words with non-English characters, potentially related to a specific language or encoding system
special characters or sequences in a document
New Auto-Interp
Negative Logits
geries
-1.04
foreground
-0.83
distracted
-0.75
distracting
-0.71
background
-0.70
dividing
-0.68
intrig
-0.67
gery
-0.67
Catalyst
-0.66
raints
-0.66
POSITIVE LOGITS
и
1.12
ļ
1.06
Ħ
1.06
ij
1.04
о
0.98
а
0.95
à¸
0.94
Ñĥ
0.94
ski
0.94
ł
0.94
Activations Density 0.006%