INDEX
Explanations
common prepositions and conjunctions indicating relationships between phrases and concepts
New Auto-Interp
Negative Logits
adder
-0.18
adders
-0.16
opot
-0.16
isman
-0.15
dom
-0.15
edar
-0.15
adas
-0.15
orig
-0.14
Domin
-0.14
alamat
-0.14
POSITIVE LOGITS
/cms
0.17
omi
0.17
424
0.15
voje
0.15
дво
0.15
accom
0.15
nier
0.14
413
0.14
426
0.14
اÙĦعاÙĦÙħ
0.14
Activations Density 0.001%