INDEX
Explanations
colloquial expressions and variations of the word "lo."
New Auto-Interp
Negative Logits
ãĥ³
-0.28
gard
-0.21
o
-0.21
g
-0.19
gien
-0.18
nul
-0.18
dum
-0.18
tod
-0.17
y
-0.17
d
-0.17
POSITIVE LOGITS
ped
0.27
path
0.23
idy
0.22
ping
0.22
rent
0.21
ren
0.21
pad
0.21
so
0.20
ret
0.20
rem
0.19
Activations Density 0.019%