INDEX
Explanations
non-english phrases or words
New Auto-Interp
Negative Logits
Additionally
0.82
Similarly
0.79
Alternatively
0.78
Similarly
0.77
وكذلك
0.76
他にも
0.74
это
0.73
này
0.71
Aside
0.71
Aside
0.70
POSITIVE LOGITS
bowels
0.55
কখনও
0.52
quas
0.49
assigns
0.48
mehrere
0.47
mittels
0.47
suddiv
0.47
disturbances
0.46
punyai
0.46
ஏற்படுத்த
0.46
Activations Density 0.336%