INDEX
Explanations
instances of the name "Joe" and its variations in different contexts
New Auto-Interp
Negative Logits
ne
-0.20
âĹĦ
-0.20
uld
-0.18
rikes
-0.17
repo
-0.16
竹
-0.15
rian
-0.15
room
-0.15
te
-0.15
rij
-0.15
POSITIVE LOGITS
Biden
0.21
ctions
0.17
Blow
0.17
cken
0.16
fractional
0.15
ys
0.15
Rog
0.15
Jonas
0.14
Stalin
0.14
oload
0.14
Activations Density 0.015%