INDEX
Explanations
terms related to conflict or opposition, including descriptions of war crimes and political tensions
New Auto-Interp
Negative Logits
Introduced
-0.78
nod
-0.78
ilk
-0.70
nce
-0.69
answer
-0.66
eus
-0.64
ghan
-0.64
thereof
-0.63
bie
-0.63
iments
-0.62
POSITIVE LOGITS
sorts
1.22
theirs
0.87
attrition
0.82
Roses
0.75
course
0.73
hers
0.73
catch
0.72
ãĥĦ
0.72
ours
0.71
yours
0.71
Activations Density 0.090%