INDEX
Explanations
words followed by punctuation
New Auto-Interp
Negative Logits
䚯
0.85
सीएचएसएल
0.80
㪇
0.80
<unused1738>
0.80
蟶
0.80
<unused231>
0.78
ὺς
0.77
<unused1666>
0.76
昍
0.76
琟
0.76
POSITIVE LOGITS
<eos>
3.73
៕
2.11
<end_of_turn>
1.99
↵↵↵↵↵↵↵↵
1.87
↵↵↵↵↵↵↵↵↵↵
1.79
↵↵↵↵↵↵↵↵↵
1.78
↵↵↵↵↵↵↵↵↵↵↵↵
1.75
↵↵↵↵↵↵↵↵↵↵↵↵↵
1.74
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
1.72
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
1.71
Activations Density 0.707%