INDEX
Explanations
references to walls or structures in text
New Auto-Interp
Negative Logits
zelf
-0.17
eka
-0.16
/errors
-0.16
ois
-0.15
rana
-0.15
esus
-0.15
elli
-0.15
sak
-0.15
OrCreate
-0.15
oga
-0.15
POSITIVE LOGITS
abies
0.26
aby
0.25
flower
0.24
-mounted
0.24
owing
0.23
/window
0.23
ace
0.22
aver
0.21
papers
0.21
paper
0.21
Activations Density 0.034%