INDEX
Explanations
instances of the start of a document or significant breakpoints in text
New Auto-Interp
Negative Logits
autorytatywna
-1.37
IUrlHelper
-1.16
Autoritní
-1.08
mergeFrom
-1.04
rungsseite
-0.98
afficheront
-0.97
مرئيه
-0.95
Савезне
-0.94
newBuilder
-0.94
Biôgrafia
-0.93
POSITIVE LOGITS
↵↵
1.02
↵
0.83
↵↵↵
0.69
0.62
↵↵↵↵
0.62
[toxicity=0]
0.53
<eos>
0.51
.
0.50
↵↵↵↵↵↵
0.47
parem
0.45
Activations Density 0.149%