INDEX
Explanations
instances of the word "gri" or related word forms
New Auto-Interp
Negative Logits
ncy
-0.18
nout
-0.16
rou
-0.15
ritch
-0.15
cop
-0.15
mission
-0.15
λον
-0.14
rowse
-0.14
nd
-0.14
roud
-0.14
POSITIVE LOGITS
i
0.27
pped
0.21
bose
0.20
iid
0.20
ps
0.19
ãĥ¥
0.19
pen
0.19
osite
0.19
ego
0.19
pping
0.18
Activations Density 0.052%