INDEX
Explanations
references to individuals and roles in narratives or documents
New Auto-Interp
Negative Logits
teenth
-0.21
ses
-0.18
phans
-0.18
resse
-0.18
ımıza
-0.17
istra
-0.17
niest
-0.17
ials
-0.17
sembles
-0.17
panse
-0.16
POSITIVE LOGITS
gether
0.51
linear
0.49
existent
0.47
neath
0.45
ductory
0.45
adecimal
0.43
selling
0.42
teen
0.40
west
0.40
etheless
0.39
Activations Density 0.747%