INDEX
Explanations
terms related to information dissemination and instruction
New Auto-Interp
Negative Logits
INY
-0.17
iny
-0.17
inders
-0.16
oras
-0.15
gear
-0.15
uliar
-0.15
AZY
-0.15
з
-0.14
grade
-0.14
ownt
-0.14
POSITIVE LOGITS
ally
0.31
atics
0.31
ative
0.25
ercial
0.24
atica
0.24
ants
0.21
ALLY
0.19
ática
0.18
ATIVE
0.18
¹
0.17
Activations Density 0.027%