INDEX
Explanations
phrases indicating strong beliefs or predictions
modal verbs indicating possibility and capability
New Auto-Interp
Negative Logits
CTR
-0.65
Arcade
-0.64
Steps
-0.63
Indie
-0.62
Goods
-0.61
Provided
-0.60
palms
-0.60
BG
-0.58
mats
-0.58
olor
-0.57
POSITIVE LOGITS
lement
0.94
urious
0.83
nt
0.82
mint
0.79
ered
0.78
't
0.78
elt
0.76
anism
0.75
nir
0.75
scl
0.74
Activations Density 0.145%