INDEX
Explanations
prepositions indicating relationships or connections
New Auto-Interp
Negative Logits
il
-0.18
èĻ
-0.17
(il
-0.17
da
-0.16
arel
-0.16
dela
-0.16
dal
-0.15
,
-0.15
la
-0.15
boro
-0.15
POSITIVE LOGITS
etro
0.36
agnost
0.29
urn
0.29
cui
0.24
resse
0.23
abet
0.23
abol
0.22
front
0.21
orig
0.21
rott
0.21
Activations Density 0.008%