INDEX
Explanations
positive qualities or high levels of significance in text
descriptors of quality and speed
New Auto-Interp
Negative Logits
Maze
-0.69
odder
-0.59
adelphia
-0.58
Nile
-0.58
Ariel
-0.58
Ki
-0.58
Insight
-0.57
Nath
-0.57
Jude
-0.57
EE
-0.56
POSITIVE LOGITS
ractive
0.79
(>
0.76
tarian
0.69
nered
0.66
xual
0.63
sexism
0.62
auld
0.62
uilt
0.62
compliments
0.60
ifiable
0.60
Activations Density 0.321%