INDEX
Explanations
words indicating quantities or counts related to groups or categories
New Auto-Interp
Negative Logits
liv
-0.18
kus
-0.15
atori
-0.15
ãĥĭãĤ¢
-0.14
æħİ
-0.14
adier
-0.14
kus
-0.14
liv
-0.13
aph
-0.13
Liv
-0.13
POSITIVE LOGITS
others
0.19
other
0.19
diÄŁer
0.18
other
0.17
its
0.16
ien
0.15
anderen
0.15
dalÅ¡ÃŃch
0.15
åħ¶ä»ĸ
0.15
jego
0.15
Activations Density 0.142%