INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ה
0.76
However
0.68
F
0.65
A
0.62
But
0.62
K
0.58
V
0.58
или
0.57
E
0.57
Z
0.57
POSITIVE LOGITS
thus
1.02
therefore
0.94
hence
0.93
frankly
0.87
consequently
0.85
therefor
0.83
voila
0.83
whatnot
0.81
luckily
0.81
hopefully
0.80
Activations Density 0.985%