INDEX
Explanations
prejudiced or diverse views
New Auto-Interp
Negative Logits
鼻子
0.73
microsc
0.65
agod
0.62
̣ng
0.62
盒子
0.61
description
0.60
lycerin
0.60
activator
0.60
刹
0.60
dance
0.58
POSITIVE LOGITS
opinions
2.91
views
2.69
opinion
2.57
Opinions
2.54
Views
2.45
Views
2.32
Opinion
2.26
opinion
2.23
views
2.23
Opin
2.22
Activations Density 0.408%