INDEX
Explanations
comparative phrases expressing resemblance
New Auto-Interp
Negative Logits
alez
-0.77
alt
-0.77
inion
-0.77
byn
-0.72
utherland
-0.72
irtual
-0.72
arcity
-0.71
rax
-0.71
itles
-0.71
otom
-0.71
POSITIVE LOGITS
lier
1.17
liest
1.07
crap
1.03
lihood
1.00
shit
0.78
filler
0.72
an
0.70
gib
0.69
a
0.69
something
0.69
Activations Density 0.511%