INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
/
0.61
(
0.55
/
0.54
=
0.54
-
0.52
('0.50
'
0.49
(=
0.49
0.48
or
0.47
POSITIVE LOGITS
said
0.79
dijo
0.77
తెలిపారు
0.75
spokeswoman
0.73
说道
0.71
сказал
0.71
spokesperson
0.70
afirmou
0.70
spokesman
0.70
ಹೇಳಿದರು
0.69
Activations Density 0.001%