INDEX
Explanations
people, actions, or attributes associated with a subsection of a larger group
phrases that identify and describe individuals and their actions or characteristics
New Auto-Interp
Negative Logits
Merit
-0.64
Ted
-0.62
roundup
-0.61
Annie
-0.60
Quantity
-0.59
Dialog
-0.58
Newsletter
-0.58
Lou
-0.58
Fran
-0.58
Bots
-0.58
POSITIVE LOGITS
iris
0.85
mol
0.79
contemplate
0.78
rir
0.77
preceded
0.76
perceive
0.74
cients
0.74
ikk
0.73
pires
0.73
oppose
0.72
Activations Density 0.120%