INDEX
Explanations
references to walls or barriers in various contexts
New Auto-Interp
Negative Logits
temprana
-0.83
})));
-0.81
epam
-0.74
noires
-0.74
mijne
-0.73
inoxydable
-0.72
selbe
-0.72
Aimee
-0.72
wezen
-0.72
argint
-0.71
POSITIVE LOGITS
wall
2.27
WALL
2.20
Wall
2.15
walls
2.06
Wall
1.98
wall
1.97
WALL
1.86
Walls
1.84
walls
1.72
Walls
1.69
Activations Density 0.042%