INDEX
Explanations
phrases that express cumulative actions or ideas
New Auto-Interp
Negative Logits
orges
-0.16
責
-0.15
ÙĪØ·
-0.14
amar
-0.14
266
-0.14
agina
-0.14
riority
-0.13
orge
-0.13
å»
-0.13
ury
-0.13
POSITIVE LOGITS
Paulo
0.15
iland
0.15
onn
0.15
Jobs
0.15
643
0.14
Mali
0.14
497
0.14
ESIS
0.14
sov
0.14
ÑĢа
0.13
Activations Density 0.208%