INDEX
Explanations
expressions related to expectations and contradictions
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.06
3:0.07
4:0.14
5:0.02
6:0.04
7:0.34
8:0.03
9:0.05
10:0.06
11:0.13
Negative Logits
NetMessage
-1.56
ocamp
-1.54
Reviewer
-1.51
CENT
-1.45
thanks
-1.43
Ping
-1.43
ust
-1.42
SELECT
-1.42
enaries
-1.41
LAN
-1.38
POSITIVE LOGITS
Marxism
1.62
reality
1.49
perceptions
1.47
Infinite
1.46
views
1.46
morals
1.45
philosophies
1.43
orthodoxy
1.41
beliefs
1.40
belief
1.38
Activations Density 0.001%