INDEX
Explanations
instances of significant actions or events involving strong physical interactions
New Auto-Interp
Negative Logits
askell
-0.15
oust
-0.15
chant
-0.14
agli
-0.14
opia
-0.14
lem
-0.14
xCD
-0.14
lius
-0.14
stitute
-0.14
átor
-0.13
POSITIVE LOGITS
ething
0.16
Directions
0.15
pra
0.15
notated
0.14
angel
0.14
raž
0.14
660
0.14
Directions
0.13
ved
0.13
ahlen
0.13
Activations Density 0.551%