INDEX
Explanations
references to the concept of "new."
New Auto-Interp
Negative Logits
ampo
-0.15
ımıza
-0.15
994
-0.15
829
-0.14
wizard
-0.14
otto
-0.14
acea
-0.14
ojis
-0.13
ımızda
-0.13
-known
-0.13
POSITIVE LOGITS
York
0.27
roz
0.24
England
0.23
Testament
0.23
Year
0.23
Deal
0.22
Age
0.22
bie
0.22
testament
0.21
Orleans
0.21
Activations Density 0.063%