INDEX
Explanations
the word 'salary'
New Auto-Interp
Negative Logits
co
-0.66
mor
-0.61
ter
-0.61
por
-0.58
tre
-0.57
ro
-0.57
er
-0.56
B
-0.56
b
-0.56
z
-0.55
POSITIVE LOGITS
Monfieur
1.29
Efq
1.28
ainfi
1.26
myſelf
1.25
<bos>
1.16
vectorielle
1.12
pleaſure
1.12
itſelf
1.10
feroit
1.10
themſelves
1.10
Activations Density 0.226%