INDEX
Explanations
the word "were" and its variations
New Auto-Interp
Negative Logits
ax
-0.69
шой
-0.66
has
-0.63
hadiran
-0.60
pat
-0.58
FAS
-0.57
pat
-0.57
aprobó
-0.56
まと
-0.55
FAS
-0.55
POSITIVE LOGITS
were
1.38
Were
1.33
Were
1.31
were
1.23
WERE
1.12
weren
1.02
weren
0.98
WER
0.95
étaient
0.93
wer
0.89
Activations Density 0.219%