INDEX
Explanations
terms related to spatial or physical positioning, particularly focusing on the front or in close proximity
phrases that emphasize positioning or proximity
New Auto-Interp
Negative Logits
Nanto
-0.82
bern
-0.73
ofi
-0.71
ilver
-0.70
umn
-0.68
iple
-0.67
RED
-0.67
GREEN
-0.66
upper
-0.65
ucci
-0.65
POSITIVE LOGITS
prelim
0.73
edly
0.70
toile
0.63
¶ħ
0.61
apolog
0.61
Leilan
0.61
xual
0.61
instruction
0.60
pter
0.59
hallucinations
0.59
Activations Density 0.075%