INDEX
Explanations
details related to politics and international relations
New Auto-Interp
Negative Logits
iliated
-0.64
omi
-0.62
affected
-0.61
aided
-0.61
tarian
-0.60
gur
-0.60
ollen
-0.59
IUM
-0.58
ifax
-0.57
supplemented
-0.57
POSITIVE LOGITS
itiveness
0.80
Mouth
0.70
center
0.65
heels
0.63
oeuv
0.62
eat
0.62
Caesar
0.62
mole
0.61
bows
0.60
Rails
0.60
Activations Density 3.024%