INDEX
Explanations
phrases that emphasize comparisons or similarities
New Auto-Interp
Negative Logits
æºĸ
-0.16
bil
-0.15
atti
-0.15
krom
-0.15
-UA
-0.15
ettes
-0.15
uyen
-0.14
eks
-0.14
zos
-0.14
SCAN
-0.14
POSITIVE LOGITS
asad
0.18
unto
0.15
ligt
0.15
those
0.14
γεÏģι
0.14
lige
0.14
nier
0.14
ney
0.14
unto
0.14
lÃŃ
0.14
Activations Density 0.041%