INDEX
Explanations
comparisons and contrasts between ideas or entities
New Auto-Interp
Negative Logits
gaard
-0.15
uard
-0.14
anou
-0.14
ez
-0.14
ί
-0.14
Overflow
-0.14
bah
-0.14
oris
-0.14
uÄį
-0.14
ãģĹãģ®
-0.13
POSITIVE LOGITS
ourselves
0.23
yourself
0.20
everyone
0.19
anyone
0.19
everybody
0.18
those
0.18
us
0.18
anybody
0.18
myself
0.18
ourn
0.17
Activations Density 0.309%