INDEX
Explanations
phrases indicating leading, guiding, or directing
phrases indicating causation or influence
New Auto-Interp
Negative Logits
anooga
-0.68
TB
-0.65
roy
-0.63
sle
-0.62
checks
-0.61
RP
-0.59
spawned
-0.59
multiplier
-0.58
anni
-0.58
squeezed
-0.58
POSITIVE LOGITS
believe
0.87
conclude
0.85
conclusions
0.81
Oliv
0.74
realize
0.73
realise
0.73
ãĤ©
0.73
pursue
0.71
discover
0.71
ixel
0.70
Activations Density 0.164%