INDEX
Explanations
offensive strategies and force
New Auto-Interp
Negative Logits
ל
0.87
ن
0.82
ل
0.71
م
0.71
س
0.66
מ
0.61
с
0.60
رو
0.59
ك
0.59
し
0.58
POSITIVE LOGITS
’
0.55
,
0.49
ated
0.45
musical
0.44
Springsteen
0.43
Scotty
0.43
)’
0.42
animal
0.41
,’
0.40
)?
0.40
Activations Density 0.001%