INDEX
Explanations
sentences with significant numerical values or high frequency indicating importance
New Auto-Interp
Negative Logits
esternos
-0.96
ReusableCell
-0.91
للمعارف
-0.89
<pad>
-0.85
<unused79>
-0.85
<unused47>
-0.85
<unused41>
-0.85
Infórmanos
-0.85
<unused52>
-0.85
<unused8>
-0.85
POSITIVE LOGITS
↵↵
0.47
The
0.38
Besides
0.34
Though
0.33
Furthermore
0.32
</u>
0.31
</i>
0.31
Besides
0.30
Though
0.30
The
0.29
Activations Density 0.719%