INDEX
Explanations
pronouns that indicate second-person and first-person perspectives
New Auto-Interp
Negative Logits
sobie
-0.18
box
-0.15
ually
-0.15
spor
-0.14
Forge
-0.14
vlas
-0.14
Ortiz
-0.14
ildo
-0.14
mac
-0.14
paramref
-0.14
POSITIVE LOGITS
esson
0.16
è³
0.16
Pla
0.16
chooser
0.15
tabs
0.15
ekli
0.15
rong
0.15
_tokenize
0.15
andler
0.14
alive
0.14
Activations Density 0.025%