INDEX
Negative Logits
orig
-0.14
uben
-0.14
Oversight
-0.14
Controls
-0.14
ollah
-0.13
ë¨
-0.13
residence
-0.13
ÑĨей
-0.12
hits
-0.12
Sach
-0.12
POSITIVE LOGITS
ERTICAL
0.16
å²³
0.16
zik
0.15
ISIBLE
0.15
usher
0.14
åĹ
0.14
wink
0.14
sdale
0.14
ysl
0.14
onView
0.14
Activations Density 0.011%