INDEX
Explanations
phrases that describe common situations or concepts
New Auto-Interp
Negative Logits
thren
-0.16
anton
-0.15
lio
-0.15
ingleton
-0.14
ntity
-0.14
bjerg
-0.14
Advantage
-0.14
essler
-0.14
Matters
-0.14
advantage
-0.14
POSITIVE LOGITS
xuyên
0.27
occurrence
0.26
-place
0.23
among
0.21
amongst
0.20
encountered
0.20
occurring
0.20
/pop
0.19
-found
0.19
occ
0.19
Activations Density 0.072%