INDEX
Explanations
I had, she needed, he couldn't
New Auto-Interp
Negative Logits
Indeed
0.79
indeed
0.77
indeed
0.73
Indeed
0.71
effectivement
0.60
确实
0.57
infatti
0.55
Infatti
0.53
確かに
0.51
memang
0.50
POSITIVE LOGITS
likely
0.44
struggles
0.43
unlikely
0.43
probably
0.42
struggle
0.41
say
0.39
serrat
0.38
ශ්ය
0.38
omt
0.38
딨
0.38
Activations Density 0.012%