INDEX
Explanations
specific words or phrases that indicate personal names or notable entities
New Auto-Interp
Negative Logits
ka
-0.32
li
-0.25
che
-0.25
pro
-0.24
ben
-0.24
bo
-0.24
nya
-0.23
be
-0.23
la
-0.23
tr
-0.22
POSITIVE LOGITS
Ùĭ
0.28
ught
0.25
eus
0.25
'nın
0.25
’nın
0.25
issance
0.23
frica
0.22
irement
0.22
ugh
0.21
ughter
0.21
Activations Density 0.902%