INDEX
Explanations
occurrences of specific punctuation and white space characters, likely focusing on formatting or structuring of the text
numbered list item
New Auto-Interp
Negative Logits
SharedDtor
-0.92
queſta
-0.81
transfieras
-0.78
хьтан
-0.75
ویکیآمباردا
-0.75
صوتيه
-0.73
चीज़ों
-0.73
Jeografia
-0.71
ſind
-0.70
Бахар
-0.69
POSITIVE LOGITS
hipó
0.43
UC
0.40
Bob
0.40
1
0.40
0.39
Service
0.38
Bob
0.38
↵↵
0.38
The
0.37
Exit
0.37
Activations Density 0.356%