INDEX
Explanations
phrases related to the concept of 'origin' or beginnings
New Auto-Interp
Negative Logits
ington
-0.18
iro
-0.15
ew
-0.14
alar
-0.14
°
-0.14
erence
-0.14
,
-0.14
if
-0.13
at
-0.13
erman
-0.13
POSITIVE LOGITS
ONGL
0.17
forge
0.15
YNC
0.15
entially
0.15
/source
0.15
arily
0.15
dden
0.15
obuf
0.15
ummings
0.15
ãĥĭãĤ¢
0.15
Activations Density 0.028%