INDEX
Explanations
investigate complaints, allegations, deaths
New Auto-Interp
Negative Logits
м
3.06
م
2.78
مة
2.02
ت
1.94
ਣੀ
1.89
х
1.87
י
1.84
ม
1.82
мся
1.81
ם
1.81
POSITIVE LOGITS
f
2.33
Investigation
2.03
at
1.88
an
1.82
ent
1.74
ad
1.72
un
1.70
ut
1.70
ig
1.69
اب
1.65
Activations Density 0.018%