INDEX
Explanations
phrases indicating finality or conclusion
words that indicate possession or relation
New Auto-Interp
Negative Logits
rul
-0.82
destro
-0.73
nodd
-0.69
ahon
-0.68
seiz
-0.67
cous
-0.66
ãĥ¼ãĥĨãĤ£
-0.64
Citiz
-0.62
warr
-0.60
Reloaded
-0.59
POSITIVE LOGITS
lest
0.93
©
0.89
.:
0.84
.
0.84
when
0.77
.#
0.76
.</
0.75
.[
0.74
.<
0.74
©
0.74
Activations Density 0.463%