INDEX
Explanations
instances of contrastive conjunctions or phrases indicating exceptions
New Auto-Interp
Negative Logits
ksam
-0.16
multif
-0.15
TEL
-0.15
اراÙĨ
-0.14
gregar
-0.13
å¼¥
-0.13
thrown
-0.13
Fest
-0.13
273
-0.13
Explorer
-0.13
POSITIVE LOGITS
.Invariant
0.15
Nass
0.14
landa
0.14
ναν
0.14
nap
0.14
hope
0.14
branch
0.14
su
0.14
_HINT
0.14
hope
0.14
Activations Density 0.235%