INDEX
Explanations
mentions of linear concepts or terms, particularly in mathematical or technical contexts
New Auto-Interp
Negative Logits
amax
-0.21
nonlinear
-0.20
ennis
-0.18
anela
-0.17
ing
-0.17
ENTE
-0.16
ÐIJÑĢÑħÑĸв
-0.16
yre
-0.15
ingroup
-0.15
ente
-0.15
POSITIVE LOGITS
ly
0.41
ized
0.35
ization
0.32
izing
0.28
izable
0.28
ize
0.27
ities
0.27
ised
0.26
isation
0.24
izes
0.23
Activations Density 0.014%