INDEX
Explanations
references to specific spans or structures in textual data
New Auto-Interp
Negative Logits
yas
-0.17
-ÑĤо
-0.16
ystone
-0.16
Spatial
-0.16
ermann
-0.16
sk
-0.15
Spatial
-0.15
annon
-0.15
ted
-0.15
erman
-0.15
POSITIVE LOGITS
ned
0.31
ning
0.29
nable
0.27
iards
0.27
iard
0.26
nung
0.20
Span
0.19
span
0.18
oud
0.17
berger
0.17
Activations Density 0.012%