INDEX
Explanations
references to physical walls and their interactions
wall in different languages
New Auto-Interp
Negative Logits
Rücks
-0.33
vloer
-0.33
tents
-0.33
KHR
-0.30
campsites
-0.30
lwjgl
-0.30
Stakes
-0.30
хь
-0.29
legte
-0.29
legt
-0.28
POSITIVE LOGITS
wall
0.93
wall
0.77
parede
0.76
Wall
0.70
Wall
0.69
parete
0.69
WALL
0.69
WALL
0.68
dinding
0.63
墙
0.63
Activations Density 0.011%