INDEX
Explanations
military operations and psychology
New Auto-Interp
Negative Logits
dır
0.58
p
0.52
ي
0.52
i
0.51
d
0.48
म
0.46
권
0.45
ку
0.45
चौथ
0.44
Лондон
0.44
POSITIVE LOGITS
military
0.56
군
0.51
militaire
0.48
army
0.47
infantry
0.47
an
0.42
Army
0.40
ing
0.40
Army
0.40
军
0.40
Activations Density 0.023%