INDEX
Explanations
references to the letter "K" in various contexts
New Auto-Interp
Negative Logits
annes
-0.18
uger
-0.16
ahn
-0.16
NIL
-0.16
af
-0.16
uges
-0.15
bject
-0.15
wid
-0.15
ardash
-0.15
argas
-0.14
POSITIVE LOGITS
haled
0.24
ieran
0.22
eri
0.20
ately
0.19
acey
0.19
rys
0.19
sen
0.19
irst
0.18
jet
0.17
ofi
0.16
Activations Density 0.024%