INDEX
Explanations
single quotes and punctuation marks, indicating a focus on direct quotes or dialogue within the text
numbers and specifications
New Auto-Interp
Negative Logits
but
-0.31
Fortunately
-0.26
.
-0.25
what
-0.25
either
-0.25
Luckily
-0.24
Forschungs
-0.23
after
-0.23
Fortunately
-0.22
rağmen
-0.22
POSITIVE LOGITS
GEBURTSDATUM
0.99
autorytatywna
0.97
noDo
0.90
<<<<<<<<<<<<<<
0.88
Normdatei
0.85
:✨
0.85
<pad>
0.83
<unused51>
0.82
<unused43>
0.82
<unused23>
0.82
Activations Density 0.434%