INDEX
Explanations
contrasting or conditional statements
New Auto-Interp
Negative Logits
ppl
0.80
plz
0.70
idk
0.66
lots
0.65
govt
0.64
đc
0.62
btw
0.62
pls
0.61
probs
0.59
approx
0.58
POSITIVE LOGITS
Unlike
0.71
Alongside
0.67
вовсе
0.63
तकरीबन
0.61
ведь
0.60
Именно
0.57
Ведь
0.56
Surprisingly
0.55
Undoubtedly
0.55
Perhaps
0.55
Activations Density 0.013%