INDEX
Explanations
words related to attraction or persuasion
instances of the word "ent."
New Auto-Interp
Negative Logits
士
-0.67
Spears
-0.65
inclusive
-0.61
phrine
-0.61
overdue
-0.60
BILITIES
-0.59
Ø©
-0.59
aneers
-0.58
Scotia
-0.58
ned
-0.58
POSITIVE LOGITS
ourage
1.18
ailed
1.11
rench
1.10
renched
1.08
rust
1.02
repre
1.02
rance
1.02
ropy
1.02
race
0.99
rained
0.96
Activations Density 0.027%