INDEX
Explanations
references to humanitarian relief efforts and disasters
New Auto-Interp
Negative Logits
iglia
-0.14
rani
-0.14
idot
-0.14
pard
-0.14
.flat
-0.14
estead
-0.14
_ctxt
-0.14
oot
-0.13
ummies
-0.13
OTTOM
-0.13
POSITIVE LOGITS
Red
0.56
Red
0.48
.Red
0.36
red
0.36
_Red
0.35
RED
0.35
_red
0.34
-red
0.34
红
0.34
red
0.33
Activations Density 0.016%