INDEX
Explanations
thematic contrasts and statements about identity or absence of belief
New Auto-Interp
Negative Logits
antMatchers
-0.56
########.
-0.56
וויק
-0.54
}))
-0.54
estekak
-0.52
#+#
-0.52
ويكيپيديا
-0.51
Aristi
-0.50
Himo
-0.50
jScrollPane
-0.49
POSITIVE LOGITS
unlike
0.85
unlike
0.79
avoid
0.78
RTLR
0.77
avoid
0.75
avoiding
0.74
avoided
0.72
NOT
0.72
Unlike
0.71
contrary
0.71
Activations Density 0.382%