INDEX
Explanations
phrases indicating confrontation or opposition
New Auto-Interp
Negative Logits
Helpful
-0.69
catentry
-0.68
vous
-0.66
poisoning
-0.62
ghan
-0.60
otom
-0.59
ients
-0.58
irlf
-0.57
batch
-0.56
zsche
-0.56
POSITIVE LOGITS
atoon
0.75
doorway
0.69
ailable
0.69
against
0.68
é¾įå¥ij士
0.67
ardless
0.67
elight
0.67
thouse
0.66
wark
0.66
iage
0.65
Activations Density 0.064%