INDEX
Explanations
goes directly against principles
New Auto-Interp
Negative Logits
측정
0.86
لاحظوا
0.86
immersive
0.81
immersive
0.79
rhyth
0.79
তৈরির
0.77
भोग
0.76
எடுத்து
0.76
跏
0.75
hythmic
0.75
POSITIVE LOGITS
betrayal
2.13
traitor
2.12
loyalty
2.10
betray
1.99
loyal
1.94
allegiance
1.89
Loyalty
1.84
betrayed
1.81
বিশ্বাসঘাত
1.65
Loy
1.55
Activations Density 0.116%