INDEX
Explanations
proper nouns related to research and academia
New Auto-Interp
Negative Logits
onso
-0.17
quet
-0.16
öst
-0.15
ipi
-0.15
vla
-0.14
.mime
-0.14
warm
-0.14
برد
-0.14
NotAllowed
-0.14
eyse
-0.14
POSITIVE LOGITS
rum
0.15
rip
0.15
rum
0.15
ar
0.15
Rum
0.14
Medal
0.14
rix
0.14
Lazy
0.14
Rog
0.14
mos
0.14
Activations Density 0.038%