INDEX
Explanations
phrases indicating possession or attribution
New Auto-Interp
Negative Logits
unner
-0.16
iper
-0.15
ipers
-0.14
332
-0.13
irsch
-0.13
ropa
-0.13
316
-0.13
ropp
-0.13
141
-0.13
arm
-0.13
POSITIVE LOGITS
ãĥ³ãĤ¸
0.16
klär
0.15
acco
0.15
Counts
0.14
Ì£
0.14
į¼
0.14
_nth
0.13
¾
0.13
495
0.13
ãĥ£
0.13
Activations Density 0.101%