INDEX
Explanations
references to the concept of 'origin'
New Auto-Interp
Negative Logits
</i>
-0.81
</b>
-0.80
dymyr
-0.71
Monza
-0.67
Samoa
-0.67
Waw
-0.65
owulf
-0.65
<b>
-0.63
Chham
-0.62
s
-0.62
POSITIVE LOGITS
Origin
2.06
origin
2.03
Origin
2.00
origin
2.00
ORIGIN
1.78
Origins
1.77
origins
1.76
ORIGIN
1.66
Origins
1.66
origins
1.60
Activations Density 0.068%