INDEX
Negative Logits
Neville
-0.14
spo
-0.14
‘
-0.14
.comp
-0.14
ery
-0.14
Flush
-0.14
apo
-0.14
Elder
-0.14
jee
-0.14
ìĸij
-0.14
POSITIVE LOGITS
館
0.17
iç
0.15
istle
0.15
isti
0.15
chwitz
0.15
:↵↵↵↵↵↵
0.15
ichert
0.15
edla
0.14
ÑĥлÑı
0.14
isay
0.14
Activations Density 0.040%