INDEX
Explanations
phrases involving the concept of going or being backwards
references to performing actions in reverse
New Auto-Interp
Negative Logits
rament
-0.83
raltar
-0.80
ateurs
-0.79
"},"
-0.75
chens
-0.75
riz
-0.75
akings
-0.74
atum
-0.74
ulet
-0.73
anooga
-0.72
POSITIVE LOGITS
stairs
0.94
wards
0.94
ward
0.86
compatibility
0.80
compat
0.78
step
0.76
spiral
0.73
WARD
0.72
fitted
0.71
side
0.70
Activations Density 0.017%