INDEX
Explanations
phrases that emphasize collective or shared experiences
New Auto-Interp
Negative Logits
ayet
-0.16
onu
-0.16
anou
-0.16
egree
-0.16
infinity
-0.15
chwitz
-0.15
yang
-0.14
äºİæĺ¯
-0.14
yat
-0.14
izmet
-0.14
POSITIVE LOGITS
æ¯ķ
0.20
proÄį
0.17
weren
0.17
aren
0.17
why
0.17
wasn
0.16
who
0.16
isn
0.15
Who
0.15
pÅĻece
0.15
Activations Density 0.021%