INDEX
Explanations
phrases indicating rankings or positions of entities
New Auto-Interp
Negative Logits
cast
-0.48
<bos>
-0.48
zelfde
-0.47
arada
-0.45
casting
-0.44
vecka
-0.43
rsiniz
-0.43
dore
-0.42
やってきた
-0.42
vindo
-0.42
POSITIVE LOGITS
fastest
0.98
largest
0.97
ⓧ
0.91
widest
0.91
smartest
0.90
ویکیپدیا
0.88
brightest
0.88
greatest
0.88
#+#
0.87
busiest
0.86
Activations Density 0.296%