INDEX
Explanations
phrases related to betrayal
words and themes related to betrayal and deception
New Auto-Interp
Negative Logits
nesota
-0.88
nets
-0.79
aho
-0.76
lining
-0.73
ifying
-0.72
ums
-0.72
umin
-0.72
ULAR
-0.69
ifiers
-0.69
ocating
-0.68
POSITIVE LOGITS
cipled
0.86
loyalty
0.78
oath
0.76
allegiance
0.76
edience
0.74
compass
0.73
betray
0.73
warm
0.72
kindness
0.70
jer
0.68
Activations Density 0.101%