INDEX
Explanations
collective pronouns and phrases indicating shared experience or knowledge
New Auto-Interp
Negative Logits
oksen
-0.15
esti
-0.15
arine
-0.15
quia
-0.15
avia
-0.14
окон
-0.14
ucceeded
-0.14
roid
-0.14
\xaa
-0.13
lia
-0.13
POSITIVE LOGITS
meet
0.25
witness
0.24
follows
0.22
follow
0.21
meeting
0.21
accompany
0.21
Meet
0.21
priv
0.20
learn
0.20
accompanies
0.20
Activations Density 0.038%