INDEX
Explanations
second-person pronouns and related phrases indicating personal experience or direct address
New Auto-Interp
Negative Logits
Kron
-0.15
uslim
-0.14
кав
-0.14
shaw
-0.14
.dw
-0.14
ittest
-0.14
ypsum
-0.14
ÑģÑĮ
-0.14
ourke
-0.13
ISCO
-0.13
POSITIVE LOGITS
found
0.36
find
0.35
finds
0.34
found
0.32
finding
0.31
finden
0.30
encontrar
0.30
æī¾åΰ
0.29
Find
0.29
(find
0.28
Activations Density 0.074%