INDEX
Explanations
arguments or discussions related to specific beliefs or ideas
New Auto-Interp
Negative Logits
ECA
-0.69
pin
-0.65
!.
-0.65
%.
-0.62
UF
-0.62
+.
-0.60
si
-0.60
yi
-0.58
kie
-0.57
ãĤ´ãĥ³
-0.57
POSITIVE LOGITS
hesda
0.71
someone
0.71
somehow
0.70
ritical
0.70
bothered
0.67
someone
0.66
omission
0.65
hindsight
0.64
inaction
0.63
outsiders
0.62
Activations Density 0.130%