INDEX
Explanations
terms related to architecture and architectural design
New Auto-Interp
Negative Logits
oo
-0.19
ager
-0.15
airo
-0.14
ara
-0.14
oad
-0.14
erator
-0.14
114
-0.14
621
-0.14
hood
-0.14
622
-0.14
POSITIVE LOGITS
urally
0.29
ural
0.23
ivist
0.20
atron
0.19
essel
0.19
/arch
0.19
itect
0.18
sư
0.17
urve
0.17
URAL
0.17
Activations Density 0.016%