INDEX
Explanations
references to overarching themes or concepts
New Auto-Interp
Negative Logits
sville
-0.16
forth
-0.14
ÙĨÛĮÙĨ
-0.14
forth
-0.14
ido
-0.13
زÙĬ
-0.13
Shard
-0.13
uilder
-0.13
ibir
-0.13
à¹Ģà¸īà¸ŀาะ
-0.13
POSITIVE LOGITS
bigger
0.42
larger
0.34
broader
0.34
Larger
0.30
big
0.30
wider
0.29
picture
0.28
å®ı
0.28
macro
0.27
macros
0.27
Activations Density 0.133%