INDEX
Explanations
proper nouns, particularly names of individuals and entities
New Auto-Interp
Negative Logits
ept
-0.15
ẹp
-0.15
omi
-0.15
ÌĢ
-0.14
.volley
-0.14
796
-0.14
ferred
-0.14
permit
-0.14
odyn
-0.14
.Bunifu
-0.14
POSITIVE LOGITS
Sisters
0.17
Brothers
0.16
uis
0.16
brothers
0.15
_rw
0.15
ì͍
0.15
Ïĩο
0.14
å§ĵ
0.14
Bros
0.14
æ°ı
0.14
Activations Density 0.035%