INDEX
Explanations
words related to conversation and discussion
discussions surrounding social interactions and relationships
New Auto-Interp
Negative Logits
policymakers
-0.60
administr
-0.59
relying
-0.57
analysts
-0.54
broadly
-0.54
economically
-0.53
reliance
-0.52
economic
-0.52
analysis
-0.50
incumbent
-0.50
POSITIVE LOGITS
fucking
0.82
FUCK
0.80
fuck
0.77
fuckin
0.76
shit
0.76
shitty
0.75
HAHAHAHA
0.73
goddamn
0.72
fuck
0.71
Fuck
0.71
Activations Density 2.561%