INDEX
Explanations
phrases that depict descriptions or characterizations of individuals and concepts
New Auto-Interp
Negative Logits
hoff
-0.15
uju
-0.14
506
-0.14
vak
-0.14
aser
-0.13
loom
-0.13
Advice
-0.13
303
-0.13
ating
-0.13
sö
-0.13
POSITIVE LOGITS
differently
0.28
as
0.24
sebagai
0.21
как
0.17
بأÙĨ
0.17
æĽ°
0.16
jako
0.16
clusters
0.16
ÏīÏĤ
0.16
Ñıк
0.15
Activations Density 0.104%