INDEX
Explanations
phrases related to intentionally disregarding something
instances of the word "ignore" and its variations
New Auto-Interp
Negative Logits
unal
-0.68
uliffe
-0.68
ickr
-0.67
ramer
-0.66
ccording
-0.65
millenn
-0.64
Stars
-0.63
amide
-0.62
ikuman
-0.62
creation
-0.62
POSITIVE LOGITS
ibly
0.90
ibility
0.72
lessly
0.67
underestimate
0.64
Sakuya
0.64
illy
0.63
aside
0.63
prejudice
0.61
fulness
0.61
erella
0.60
Activations Density 0.028%