INDEX
Explanations
negations and phrases that challenge the concept of independence or ownership
New Auto-Interp
Negative Logits
oway
-0.83
aughtered
-0.77
VER
-0.76
owship
-0.72
tons
-0.70
original
-0.69
very
-0.69
rans
-0.69
NA
-0.66
enery
-0.66
POSITIVE LOGITS
mere
0.77
akin
0.73
artif
0.70
confer
0.67
disadvant
0.67
merely
0.65
implying
0.65
conflic
0.65
extortion
0.63
sectarian
0.63
Activations Density 0.139%