INDEX
Explanations
phrases questioning or comparing actions or choices
be verbs that question the nature or morality of actions
New Auto-Interp
Negative Logits
esa
-0.69
places
-0.64
aneers
-0.64
Working
-0.63
Dreams
-0.61
Dragonbound
-0.61
FTWARE
-0.59
Position
-0.59
cko
-0.58
irds
-0.57
POSITIVE LOGITS
omorphic
0.83
Ͻ
0.80
hap
0.78
nt
0.73
gur
0.72
berra
0.71
olated
0.68
anybody
0.68
earch
0.68
senal
0.65
Activations Density 0.130%