INDEX
Explanations
proper nouns, specifically names and titles
New Auto-Interp
Negative Logits
ä
-0.17
çķ
-0.17
é
-0.16
lu
-0.16
adam
-0.15
yal
-0.15
/Foundation
-0.15
tah
-0.15
tent
-0.15
l
-0.15
POSITIVE LOGITS
arcs
0.17
csi
0.16
ció
0.15
asan
0.15
conomy
0.15
cs
0.14
plete
0.14
lyn
0.14
iag
0.14
gb
0.14
Activations Density 0.002%