INDEX
Explanations
pronouns followed by verbs
the pronoun "it."
New Auto-Interp
Negative Logits
hips
-0.62
Polk
-0.58
911
-0.56
Friend
-0.54
ãĥ¼ãĥĨãĤ£
-0.54
quist
-0.51
Uni
-0.50
Gloria
-0.49
friends
-0.49
Pearce
-0.48
POSITIVE LOGITS
self
0.93
chy
0.88
alian
0.86
unes
0.86
zbollah
0.83
iner
0.77
ueller
0.77
chwitz
0.70
asca
0.70
achi
0.68
Activations Density 0.350%