INDEX
Explanations
terms related to quarrels or conflicts
New Auto-Interp
Negative Logits
454
-0.17
edly
-0.16
627
-0.15
ebra
-0.15
575
-0.15
hev
-0.15
cents
-0.15
Ori
-0.14
ø
-0.14
_ABI
-0.14
POSITIVE LOGITS
term
0.26
antine
0.25
rel
0.24
rels
0.24
ries
0.24
rell
0.21
tern
0.21
coop
0.20
REL
0.19
rel
0.19
Activations Density 0.004%