INDEX
Explanations
I/we followed by informal verbs
New Auto-Interp
Negative Logits
Additionally
0.63
此外
0.55
较为
0.51
additional
0.51
Additionally
0.49
additional
0.49
Additional
0.48
एवं
0.47
Furthermore
0.46
অপরদিকে
0.46
POSITIVE LOGITS
didn
0.92
dunno
0.86
gotta
0.84
kinda
0.80
probably
0.79
gonna
0.78
hadn
0.77
couldn
0.77
ain
0.77
wasn
0.77
Activations Density 0.293%