INDEX
Explanations
code comments and definitions
New Auto-Interp
Negative Logits
self
1.10
self
1.03
Self
0.82
само
0.82
自
0.77
Self
0.76
zelf
0.73
själv
0.70
עצ
0.68
SELF
0.68
POSITIVE LOGITS
/**
0.70
/**
0.60
/**/*
0.52
!***
0.52
$$\
0.46
"""
0.45
ث
0.43
""".
0.43
τρα
0.42
"""
0.41
Activations Density 0.008%