INDEX
Explanations
phrases related to behavior or traits that are typical or often observed in particular situations
phrases that indicate habitual behaviors or inclinations
New Auto-Interp
Negative Logits
aban
-0.80
idden
-0.77
mberg
-0.75
gur
-0.75
ÄŁ
-0.73
estamp
-0.72
loo
-0.72
riel
-0.71
oho
-0.71
ighth
-0.71
POSITIVE LOGITS
toward
1.05
towards
0.99
tendency
0.97
tendencies
0.95
predis
0.78
shift
0.75
propensity
0.75
behavior
0.73
Towards
0.72
behaviour
0.72
Activations Density 0.023%