INDEX
Explanations
references to scientific studies and their publication details
New Auto-Interp
Negative Logits
reddits
-0.79
-0.63
tarian
-0.62
arson
-0.62
-0.61
ueller
-0.59
404
-0.59
mute
-0.58
uart
-0.58
rog
-0.58
POSITIVE LOGITS
Proceedings
0.88
Lancet
0.87
IEEE
0.83
Perception
0.82
PLoS
0.80
Psychological
0.80
Journal
0.78
Signs
0.77
Quarterly
0.76
BMC
0.76
Activations Density 0.033%