INDEX
Explanations
verbs denoting communication such as speaking, saying, and thinking
verbs indicating ongoing actions or contributions
New Auto-Interp
Negative Logits
selves
-0.78
Higher
-0.70
respective
-0.70
wayne
-0.69
ocating
-0.68
iners
-0.67
iky
-0.65
RELATED
-0.64
wik
-0.63
illion
-0.62
POSITIVE LOGITS
himself
0.82
herself
0.76
his
0.67
brilliantly
0.65
shotgun
0.63
Bord
0.60
unsuccessfully
0.60
sage
0.59
valiant
0.59
ographs
0.59
Activations Density 0.420%