INDEX
Explanations
terms related to learning and educational processes
New Auto-Interp
Negative Logits
ural
-0.17
è¨
-0.16
aji
-0.16
ewater
-0.15
uro
-0.14
sy
-0.14
ims
-0.14
jes
-0.13
bis
-0.13
reh
-0.13
POSITIVE LOGITS
andest
0.17
.Meta
0.16
Wunused
0.16
/Instruction
0.16
nez
0.15
quan
0.15
mite
0.15
ضاء
0.15
_utilities
0.15
ÑĤаб
0.14
Activations Density 0.041%