INDEX
Explanations
self-defense and self-positioning
New Auto-Interp
Negative Logits
.
0.69
'
0.61
(
0.57
’
0.55
-
0.51
:
0.51
ní
0.51
EN
0.50
AD
0.50
我
0.49
POSITIVE LOGITS
democracies
0.50
or
0.49
has
0.46
ஒவ்வொரு
0.45
aerosol
0.44
stripped
0.43
thermonuclear
0.43
cardiovascular
0.43
ꞎ
0.43
بر
0.42
Activations Density 0.006%