INDEX
Explanations
words related to ignoring or being ignored
New Auto-Interp
Negative Logits
isher
-0.80
uster
-0.72
seed
-0.70
gans
-0.67
gain
-0.67
Finish
-0.66
ikuman
-0.65
gars
-0.64
kaya
-0.64
iov
-0.63
POSITIVE LOGITS
warnings
1.14
inconvenient
1.10
pleas
0.95
obvious
0.89
glaring
0.89
cues
0.89
caveats
0.87
concerns
0.83
altogether
0.81
objections
0.81
Activations Density 0.113%