INDEX
Explanations
references to literature and its social and cultural significance
New Auto-Interp
Negative Logits
iner
-0.20
dit
-0.16
emit
-0.16
reate
-0.15
ÙģÙĤ
-0.15
rough
-0.14
ime
-0.14
άκ
-0.14
amo
-0.14
af
-0.14
POSITIVE LOGITS
its
0.28
Its
0.26
Its
0.23
å®ĥ
0.19
annya
0.19
å®ĥ们
0.18
åħ¶
0.18
its
0.18
it
0.18
suoi
0.17
Activations Density 0.379%