INDEX
Explanations
references to specific organizations or groups in various contexts
occurrences of specific entities and references to numerous components in a structured context
quantifiers and entities
New Auto-Interp
Negative Logits
pertenecen
-0.59
is
-0.52
bukanlah
-0.50
のは
-0.48
なのは
-0.48
appartiennent
-0.48
pertence
-0.48
constituye
-0.45
adalah
-0.45
belong
-0.45
POSITIVE LOGITS
decides
1.03
EconPapers
1.01
rungsseite
1.01
wants
1.00
chooses
1.00
decided
0.97
thinks
0.97
chose
0.97
يتيمه
0.94
undertook
0.94
Activations Density 0.447%