INDEX
Explanations
mentions of habits that characters have
the word "habit" and its contextual variations
New Auto-Interp
Negative Logits
ndum
-0.72
abad
-0.69
SAR
-0.68
gow
-0.67
sie
-0.65
ammy
-0.64
affer
-0.64
abama
-0.63
zac
-0.62
cross
-0.62
POSITIVE LOGITS
ually
1.21
uation
1.20
uated
1.10
habits
1.06
uate
0.95
uating
0.86
habit
0.83
uates
0.83
ality
0.82
itious
0.80
Activations Density 0.030%