INDEX
Negative Logits
åłĤ
-0.21
isans
-0.18
yk
-0.18
ække
-0.17
itarian
-0.16
ÑģÑı
-0.15
Hubbard
-0.15
'gc
-0.15
smiles
-0.15
iland
-0.15
POSITIVE LOGITS
aret
0.30
oose
0.27
ildo
0.26
rio
0.25
ecera
0.25
ernet
0.24
Cab
0.22
by
0.22
Cab
0.20
cab
0.19
Activations Density 0.008%