INDEX
Explanations
words related to behavioral inclinations or propensities
mentions of the word "tendency" and its contextual implications
New Auto-Interp
Negative Logits
estamp
-0.78
idden
-0.76
mberg
-0.75
aban
-0.74
gur
-0.69
oho
-0.69
ÄŁ
-0.69
riel
-0.69
arer
-0.68
loo
-0.66
POSITIVE LOGITS
tendency
1.05
toward
0.99
towards
0.96
tendencies
0.90
propensity
0.82
predis
0.79
shift
0.78
prone
0.71
behaviour
0.71
aversion
0.71
Activations Density 0.025%