INDEX
Explanations
now let's continue analysis
New Auto-Interp
Negative Logits
Although
0.60
Menurut
0.58
Aunque
0.57
മാത്രമല്ല
0.56
زیرا
0.55
雖然
0.55
Aunque
0.54
atschapp
0.53
Jangan
0.53
също
0.52
POSITIVE LOGITS
,
1.34
we
1.30
they
1.29
there
1.10
،
1.09
it
1.06
you
0.99
,
0.97
,(
0.90
,《
0.90
Activations Density 1.228%