INDEX
Explanations
references to original versions or originals
instances of the word "original"
New Auto-Interp
Negative Logits
orney
-0.74
robe
-0.72
risome
-0.70
rogens
-0.69
opping
-0.69
walk
-0.69
abad
-0.68
=-=-=-=-=-=-=-=-
-0.68
rom
-0.65
ammy
-0.64
POSITIVE LOGITS
ITY
0.93
incarnation
0.89
ity
0.88
batch
0.79
Filename
0.77
impetus
0.77
ãĥīãĥ©ãĤ´ãĥ³
0.76
isations
0.75
trilogy
0.75
istar
0.71
Activations Density 0.016%