INDEX
Explanations
terminology related to nuclear weapons and treaties
New Auto-Interp
Negative Logits
reper
-0.85
declass
-0.84
confisc
-0.80
persecuted
-0.75
persecut
-0.75
assass
-0.72
rescuing
-0.72
sidx
-0.72
captives
-0.71
Fukushima
-0.70
POSITIVE LOGITS
erson
0.74
Melody
0.70
Brush
0.68
ritten
0.68
riter
0.68
itch
0.67
Cherokee
0.66
inct
0.66
Experience
0.65
ois
0.64
Activations Density 0.139%