INDEX
Explanations
phrases related to habitual tendencies or behaviors
phrases that describe tendencies or behavioral patterns
New Auto-Interp
Negative Logits
gur
-0.76
yz
-0.74
arta
-0.71
lain
-0.69
ZA
-0.64
ania
-0.62
zbek
-0.60
fil
-0.60
ft
-0.59
KY
-0.59
POSITIVE LOGITS
rils
1.36
entious
1.18
ril
1.06
erer
0.88
erers
0.86
erest
0.84
toward
0.84
towards
0.80
entimes
0.80
eman
0.79
Activations Density 0.018%