INDEX
Explanations
phrases emphasizing collective actions or perspectives over individual ones
New Auto-Interp
Negative Logits
Irwin
-0.79
imura
-0.71
itor
-0.71
hillary
-0.69
Saving
-0.69
Karin
-0.67
Fitzgerald
-0.67
irlf
-0.64
GOODMAN
-0.64
Irvine
-0.63
POSITIVE LOGITS
discrete
1.16
gradual
1.00
staggered
0.99
epis
0.98
abstract
0.93
continuous
0.93
fragmented
0.93
linear
0.93
binary
0.92
sequential
0.91
Activations Density 1.076%