INDEX
Explanations
expressing i or we statements
New Auto-Interp
Negative Logits
하였다
0.61
其
0.56
하였다
0.54
하였습니다
0.52
原来
0.52
एवं
0.52
ത്വം
0.52
使其
0.51
하였
0.50
原來
0.48
POSITIVE LOGITS
definitely
0.79
aren
0.78
KNOW
0.73
know
0.73
certainly
0.73
确实
0.72
knows
0.71
isn
0.67
know
0.64
recognize
0.62
Activations Density 0.114%