INDEX
Explanations
phrases related to individual agency or actions taken independently
instances of the word "themselves."
New Auto-Interp
Negative Logits
pour
-0.74
Derby
-0.73
ammy
-0.72
rise
-0.71
von
-0.65
mare
-0.65
yip
-0.64
ulu
-0.64
Surge
-0.64
grade
-0.64
POSITIVE LOGITS
selves
1.06
selves
1.04
underwater
0.78
pecially
0.77
creatively
0.72
respective
0.71
themselves
0.71
fict
0.70
BOOK
0.70
å
0.70
Activations Density 0.037%