INDEX
Explanations
discussions about variations in terminology and their implications
New Auto-Interp
Negative Logits
oldem
-0.08
Touches
-0.07
idle
-0.07
ostel
-0.07
Translated
-0.07
KANJI
-0.06
translation
-0.06
hsi
-0.06
-Language
-0.06
osl
-0.06
POSITIVE LOGITS
usage
0.11
Usage
0.10
Usage
0.10
USAGE
0.09
_usage
0.08
usage
0.08
USAGE
0.07
ноÑĢмаÑĤив
0.07
.usage
0.07
ayn
0.06
Activations Density 0.010%