INDEX
Explanations
recurring themes related to ethical considerations and critical thinking in various contexts
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.10
3:0.06
4:0.06
5:0.03
6:0.10
7:0.27
8:0.04
9:0.03
10:0.09
11:0.12
Negative Logits
signed
-1.49
adra
-1.40
tumblr
-1.34
ereo
-1.30
reunited
-1.29
joined
-1.26
mpeg
-1.25
released
-1.25
enced
-1.23
installed
-1.20
POSITIVE LOGITS
unanswered
1.46
actionGroup
1.32
dilig
1.31
nostalg
1.31
anew
1.31
how
1.30
guessing
1.25
worrying
1.25
behavi
1.25
vulner
1.23
Activations Density 0.010%