INDEX
Explanations
references to previously established or current entities and systems
New Auto-Interp
Negative Logits
ings
-0.21
newly
-0.17
modern
-0.15
new
-0.15
recent
-0.14
s
-0.14
latest
-0.14
headed
-0.14
former
-0.14
owl
-0.14
POSITIVE LOGITS
/current
0.25
/original
0.23
/new
0.20
ones
0.19
Ones
0.17
-established
0.16
-generation
0.15
/up
0.15
tlement
0.15
zer
0.15
Activations Density 0.030%