INDEX
Explanations
references to dragons and their characteristics
New Auto-Interp
Negative Logits
rat
-0.17
opher
-0.15
spiders
-0.15
tas
-0.15
ãĥ¼ãĤ¸
-0.15
ware
-0.14
ocup
-0.14
ris
-0.14
FromArray
-0.14
rabbit
-0.14
POSITIVE LOGITS
dragon
0.27
dragon
0.26
dragons
0.25
ragon
0.25
Dragon
0.24
é¾
0.23
Dragons
0.23
Dragon
0.22
é¾į
0.20
dracon
0.19
Activations Density 0.030%