INDEX
Explanations
instances of the word "it" across various contexts
New Auto-Interp
Negative Logits
Either
-0.15
either
-0.15
anca
-0.15
Sokol
-0.14
either
-0.14
wards
-0.13
決
-0.13
cÃŃ
-0.13
Either
-0.13
ories
-0.13
POSITIVE LOGITS
pert
0.23
done
0.20
pert
0.19
always
0.19
always
0.17
Pert
0.16
relates
0.16
-done
0.15
siempre
0.15
happens
0.15
Activations Density 0.117%