INDEX
Explanations
phrases or words related to locations or directions
mentions of the word "here"
New Auto-Interp
Negative Logits
natureconservancy
-0.65
srfAttach
-0.62
asting
-0.61
oppable
-0.59
meticulous
-0.58
protective
-0.58
jug
-0.58
ripp
-0.57
athom
-0.56
salvage
-0.56
POSITIVE LOGITS
abouts
1.20
upon
1.05
soever
0.99
theless
0.98
here
0.95
tics
0.93
nces
0.91
fore
0.89
iton
0.85
tical
0.84
Activations Density 0.005%