INDEX
Explanations
references to walls or barriers in various contexts
New Auto-Interp
Negative Logits
temprana
-0.84
})));
-0.82
Plural
-0.77
inoxydable
-0.76
Judson
-0.76
selbe
-0.76
wezen
-0.75
pleaſure
-0.75
epam
-0.74
íso
-0.73
POSITIVE LOGITS
wall
2.07
WALL
2.02
Wall
1.94
walls
1.91
Wall
1.79
wall
1.78
Walls
1.75
WALL
1.68
Walls
1.61
walls
1.60
Activations Density 0.052%