INDEX
Explanations
phrases related to conversations or interactions between individuals
phrases indicating assertiveness or strong opinions
New Auto-Interp
Negative Logits
'."
-0.70
Pelosi
-0.67
Sinclair
-0.66
Pai
-0.65
NOW
-0.64
'>
-0.64
cms
-0.63
pering
-0.60
.'"
-0.59
»Ĵ
-0.59
POSITIVE LOGITS
definitely
0.94
initely
0.88
laughs
0.87
Laughs
0.80
Originally
0.78
wcsstore
0.77
yeah
0.77
depends
0.75
absolutely
0.74
Originally
0.73
Activations Density 0.529%