INDEX
Explanations
phrases that describe something well-known in terms of its particular characteristics or reputation
phrases indicating ownership or attribution
New Auto-Interp
Negative Logits
river
-0.81
Rasm
-0.72
PLEASE
-0.70
Wem
-0.70
ievers
-0.69
after
-0.68
somew
-0.66
before
-0.65
wrong
-0.65
soever
-0.65
POSITIVE LOGITS
own
1.20
inability
1.10
propensity
1.04
unorthodox
1.01
versatility
0.99
lack
0.98
penchant
0.96
proximity
0.95
portrayal
0.93
tendency
0.93
Activations Density 0.137%