INDEX
Explanations
instances of the word "from" next to descriptions or explanations of sources or origins
New Auto-Interp
Negative Logits
merce
-0.82
fps
-0.80
anooga
-0.76
idav
-0.76
aqu
-0.72
olson
-0.69
nex
-0.69
chairs
-0.69
currency
-0.68
include
-0.67
POSITIVE LOGITS
afar
1.13
nowhere
0.99
somewhere
0.99
whence
0.98
abroad
0.91
scratch
0.90
obscurity
0.84
elsewhere
0.81
thence
0.78
inside
0.75
Activations Density 0.076%