INDEX
Explanations
proper nouns associated with names and titles
New Auto-Interp
Negative Logits
ÙĪØ§ÙĦ
-0.18
peare
-0.17
Heads
-0.16
adelphia
-0.14
stopwatch
-0.14
traits
-0.14
Tod
-0.14
lia
-0.14
ptions
-0.13
itive
-0.13
POSITIVE LOGITS
iley
0.23
ba
0.18
lear
0.18
.debugLine
0.17
FTA
0.16
ically
0.16
üml
0.16
ground
0.15
ixo
0.15
ashboard
0.15
Activations Density 0.010%