INDEX
Explanations
prepositional phrases indicating location or position
New Auto-Interp
Negative Logits
eh
-0.07
chn
-0.07
pit
-0.06
Lean
-0.06
oran
-0.06
ly
-0.06
achat
-0.06
lyph
-0.06
arn
-0.06
lea
-0.06
POSITIVE LOGITS
bottom
0.07
MethodImpl
0.07
foy
0.07
439
0.07
askell
0.06
oyal
0.06
_ONCE
0.06
-parse
0.06
-bottom
0.06
omentum
0.06
Activations Density 0.010%