INDEX
Explanations
phrases that emphasize similarity and shared experiences
New Auto-Interp
Negative Logits
ocker
-0.16
erland
-0.15
\Php
-0.14
ington
-0.14
eric
-0.14
ialog
-0.14
mony
-0.14
erus
-0.14
uckles
-0.14
ÃŁer
-0.14
POSITIVE LOGITS
identical
0.43
similar
0.40
缸åIJĮ
0.38
alike
0.35
åIJĮãģĺ
0.34
same
0.34
same
0.34
åIJĮ
0.33
Same
0.33
similar
0.32
Activations Density 0.223%