INDEX
Explanations
when you're / when we're / trying to
New Auto-Interp
Negative Logits
della
0.48
emphasised
0.48
asserted
0.47
tiet
0.45
inhibited
0.45
intellect
0.45
contradicts
0.45
咟
0.45
delle
0.44
winner
0.44
POSITIVE LOGITS
0
0.64
ead
0.46
item
0.45
IM
0.45
Est
0.43
trials
0.43
וג
0.43
agles
0.43
SS
0.42
FL
0.41
Activations Density 0.002%