INDEX
Explanations
the word "no" being a prevalent theme in the text snippets it is activated by
New Auto-Interp
Negative Logits
RAFT
-0.84
romy
-0.65
mosp
-0.62
jet
-0.59
ousand
-0.56
inese
-0.56
rex
-0.55
ahime
-0.55
nesses
-0.55
encia
-0.54
POSITIVE LOGITS
xious
1.30
longer
1.20
matter
0.93
doubt
0.91
ct
0.90
obs
0.83
except
0.80
xus
0.80
icably
0.79
otrop
0.78
Activations Density 0.069%