INDEX
Explanations
defense with attorneys or security
New Auto-Interp
Negative Logits
y
0.44
cer
0.43
super
0.43
PE
0.41
dop
0.41
finde
0.40
发出
0.40
om
0.39
嵘
0.39
conserved
0.38
POSITIVE LOGITS
دفاع
0.78
defensive
0.73
defens
0.71
against
0.70
defesa
0.69
Defensive
0.66
defence
0.65
defending
0.64
defend
0.64
défendre
0.64
Activations Density 0.025%