INDEX
Explanations
expressions related to personal reflection and societal responsibilities
New Auto-Interp
Negative Logits
probably
-0.15
enson
-0.15
real
-0.15
bon
-0.15
vera
-0.15
direct
-0.14
slightly
-0.14
rather
-0.14
Cot
-0.14
rna
-0.14
POSITIVE LOGITS
anymore
0.31
nor
0.23
ANY
0.20
anybody
0.18
ä»»ä½ķ
0.16
nor
0.15
aeda
0.15
à¤ĩतन
0.15
nÃło
0.15
ίÏĦ
0.15
Activations Density 0.178%