INDEX
Explanations
phrases related to achieving a goal or completing a task
New Auto-Interp
Negative Logits
Tanz
-0.68
Lies
-0.63
cardinal
-0.63
Inher
-0.62
iasco
-0.59
defe
-0.58
iege
-0.58
alias
-0.57
folly
-0.56
ynski
-0.56
POSITIVE LOGITS
rid
1.55
acquainted
1.05
away
0.94
Started
0.94
ahead
0.87
reimb
0.87
into
0.86
INTO
0.86
traction
0.85
distracted
0.85
Activations Density 0.103%