INDEX
Explanations
important nouns and their relationships to actions and interests
New Auto-Interp
Negative Logits
bro
-0.15
رس
-0.15
Kod
-0.14
prom
-0.14
ibling
-0.14
rone
-0.14
èm
-0.14
pal
-0.13
plier
-0.13
avel
-0.13
POSITIVE LOGITS
swith
0.17
oints
0.15
[s
0.15
lessly
0.15
ennis
0.14
ws
0.14
inous
0.14
klad
0.14
storybook
0.14
ssf
0.14
Activations Density 0.651%