INDEX
Explanations
phrases related to actions, activities, and tasks
expressions of dissatisfaction or needs related to resources or processes
New Auto-Interp
Negative Logits
issance
-0.76
enegger
-0.68
abus
-0.67
Naz
-0.65
lich
-0.65
ileaks
-0.64
Merit
-0.64
xon
-0.63
justice
-0.63
Pitt
-0.63
POSITIVE LOGITS
their
1.22
themselves
1.18
their
1.03
theirs
1.01
THEIR
0.91
varying
0.90
Their
0.86
unrealistic
0.86
shortcuts
0.85
these
0.84
Activations Density 0.547%