INDEX
Explanations
phrases related to established or traditional practices
references to traditional behaviors or established norms
New Auto-Interp
Negative Logits
ighters
-0.68
merce
-0.67
semble
-0.65
UST
-0.64
scl
-0.63
Chic
-0.62
hma
-0.62
endez
-0.61
abella
-0.61
mingham
-0.60
POSITIVE LOGITS
whereby
1.26
pmwiki
0.93
ually
0.90
tendency
0.85
refrain
0.78
reversal
0.77
horse
0.75
wherein
0.74
resorted
0.69
habit
0.68
Activations Density 0.183%