INDEX
Explanations
references to the "Next Generation" or similar sequential concepts
New Auto-Interp
Negative Logits
abler
-0.18
uten
-0.16
ician
-0.15
ishments
-0.15
ilate
-0.15
aghan
-0.15
rift
-0.14
wers
-0.14
head
-0.14
defaults
-0.14
POSITIVE LOGITS
door
0.25
door
0.21
-generation
0.20
-door
0.19
ernal
0.19
GEN
0.19
Generation
0.17
week
0.17
-gen
0.16
ural
0.16
Activations Density 0.028%