INDEX
Explanations
references to cards or card-related terminology
New Auto-Interp
Negative Logits
ously
-0.18
ırak
-0.16
erties
-0.16
-cn
-0.16
olu
-0.16
icz
-0.16
rok
-0.15
OUS
-0.15
usement
-0.15
estone
-0.15
POSITIVE LOGITS
.Card
0.25
inality
0.25
igan
0.24
iology
0.24
inals
0.24
/Card
0.24
INAL
0.23
card
0.23
inal
0.23
.card
0.22
Activations Density 0.010%