INDEX
Explanations
the word "tendency" in various contexts
repeated references to the concept of "tendency."
New Auto-Interp
Negative Logits
mberg
-0.82
riel
-0.80
hemat
-0.74
gur
-0.73
han
-0.72
sung
-0.71
aban
-0.71
estamp
-0.71
enic
-0.69
ns
-0.67
POSITIVE LOGITS
tendency
1.07
tendencies
0.99
inconsistency
0.86
propensity
0.83
towards
0.81
toward
0.80
inference
0.77
aversion
0.77
predis
0.76
shift
0.75
Activations Density 0.011%