INDEX
Explanations
words that indicate actions or states of being
New Auto-Interp
Negative Logits
indebted
-0.16
shal
-0.15
аÑĢам
-0.15
paged
-0.14
нав
-0.14
hl
-0.14
дал
-0.14
ÄIJông
-0.14
exampleInput
-0.14
fac
-0.14
POSITIVE LOGITS
ibi
0.14
Cul
0.14
nackte
0.14
Samp
0.14
.uni
0.14
atoi
0.14
capped
0.14
ovit
0.14
enda
0.13
*(*
0.13
Activations Density 0.002%