INDEX
Explanations
discussions about authority figures and their actions
New Auto-Interp
Negative Logits
المعيارى
-0.62
loroethane
-0.52
مواليد
-0.51
+#+#
-0.51
SequentialGroup
-0.50
AddTagHelper
-0.47
aires
-0.46
hower
-0.45
ग्राहक
-0.45
ويكيميديا
-0.45
POSITIVE LOGITS
hoped
0.85
hoping
0.73
feared
0.72
believe
0.71
everywhere
0.71
flocked
0.70
hopes
0.70
rejoiced
0.68
hope
0.66
speculate
0.65
Activations Density 0.266%