INDEX
Explanations
phrases emphasizing individual contributions or relationships among people
New Auto-Interp
Negative Logits
each
-0.17
riad
-0.15
edic
-0.14
anian
-0.14
Ù
-0.14
ks
-0.14
ric
-0.14
iverse
-0.14
atcher
-0.14
rian
-0.14
POSITIVE LOGITS
ãĢħ
0.19
/all
0.17
respective
0.17
Nacht
0.17
successive
0.16
ting
0.16
ì¢ħ
0.15
others
0.15
strar
0.15
contre
0.15
Activations Density 0.064%