INDEX
Explanations
numerals and measurements
references to sections or parts of a text or website
New Auto-Interp
Negative Logits
nt
-0.54
ruining
-0.53
worthless
-0.52
didnt
-0.51
violation
-0.50
rubbish
-0.50
rejection
-0.50
wont
-0.48
useless
-0.48
faults
-0.48
POSITIVE LOGITS
alan
0.62
å§«
0.59
luster
0.59
arger
0.55
lique
0.55
ular
0.53
reditary
0.52
pherd
0.50
atown
0.49
ordial
0.49
Activations Density 0.692%