INDEX
Explanations
phrases indicating relationships and identities of individuals or entities
New Auto-Interp
Negative Logits
wire
-0.19
Wire
-0.16
ripp
-0.15
िब
-0.15
Horizon
-0.15
ocab
-0.14
Wiring
-0.14
915
-0.14
Riverside
-0.14
ogne
-0.13
POSITIVE LOGITS
cul
0.16
urator
0.15
aise
0.15
cr
0.14
ASI
0.14
pios
0.14
aises
0.14
atories
0.14
aten
0.14
wnd
0.14
Activations Density 0.004%