INDEX
Explanations
names and identifiers related to authors and works
New Auto-Interp
Negative Logits
jang
-0.19
ccione
-0.16
osh
-0.15
argas
-0.15
ά
-0.14
alk
-0.14
anking
-0.14
argo
-0.14
viewer
-0.14
otel
-0.14
POSITIVE LOGITS
pand
0.17
tob
0.16
""},↵
0.15
anda
0.15
763
0.15
iker
0.14
iad
0.14
Bernstein
0.14
Weiss
0.14
poru
0.14
Activations Density 0.069%