INDEX
Explanations
terms associated with influence and definition
New Auto-Interp
Negative Logits
iphate
-0.71
DERR
-0.70
idel
-0.69
ighters
-0.69
igers
-0.67
ãĥı
-0.67
intend
-0.67
aeus
-0.65
sels
-0.64
leased
-0.64
POSITIVE LOGITS
contemporary
0.85
everything
0.80
our
0.78
discussions
0.78
debates
0.77
both
0.75
modern
0.74
anthropology
0.72
colonialism
0.72
many
0.71
Activations Density 0.071%