INDEX
Explanations
words related to the direction of events or situations
references to the status or condition of things
New Auto-Interp
Negative Logits
iciency
-0.72
essor
-0.65
tein
-0.61
disav
-0.59
asking
-0.59
¿½
-0.59
ritical
-0.59
sole
-0.58
overwrite
-0.57
ividual
-0.57
POSITIVE LOGITS
downhill
0.77
for
0.75
between
0.73
backstage
0.71
unfold
0.68
roy
0.65
huh
0.64
diplom
0.63
Things
0.63
FOR
0.62
Activations Density 0.305%