INDEX
Explanations
references to loyalty and allegiance
New Auto-Interp
Negative Logits
ifrance
-0.55
continúas
-0.55
env
-0.53
sơ
-0.52
chromedriver
-0.51
تعدى
-0.50
nery
-0.49
mels
-0.49
zve
-0.49
öde
-0.49
POSITIVE LOGITS
loyal
1.41
loyal
1.02
loyalty
0.97
loyalty
0.97
Loyal
0.91
faithful
0.85
fidèle
0.82
Cyfeiriadau
0.82
Faithful
0.81
Loyalty
0.81
Activations Density 0.083%