INDEX
Explanations
references to personal relationships and interactions
New Auto-Interp
Negative Logits
ÙħÙĪØ§Ø·
-0.16
ukt
-0.16
zon
-0.14
998
-0.14
itas
-0.14
okoj
-0.14
742
-0.14
Muj
-0.14
347
-0.14
539
-0.14
POSITIVE LOGITS
ilden
0.17
.wr
0.16
apel
0.15
at
0.15
hee
0.14
Matte
0.14
dikke
0.14
ilecek
0.14
ichel
0.14
det
0.14
Activations Density 0.002%