INDEX
Explanations
phrases or sentences with specific characters or symbols like 'Ŀ'
the presence of specific formatting or structural elements in the text
New Auto-Interp
Negative Logits
incorpor
-0.80
ende
-0.76
controvers
-0.67
rundown
-0.67
notor
-0.67
obser
-0.66
mathemat
-0.66
range
-0.64
inactive
-0.63
secretaries
-0.63
POSITIVE LOGITS
ï¸ı
1.18
ÃĽ
0.91
¯
0.87
ï¸
0.86
âĢł
0.83
°
0.82
âĻ
0.81
cause
0.81
âľ
0.80
#$
0.80
Activations Density 0.153%