INDEX
Explanations
prompts asking for opinions or thoughts
questions asking for opinions or thoughts
New Auto-Interp
Negative Logits
announced
-0.79
Adin
-0.71
clad
-0.70
iere
-0.66
licensed
-0.65
itz
-0.62
wealth
-0.62
known
-0.61
Fund
-0.60
documented
-0.60
POSITIVE LOGITS
estyles
0.72
76561
0.70
constitu
0.68
about
0.67
aptic
0.67
IUM
0.66
rison
0.65
disapprove
0.64
ABOUT
0.64
aloud
0.64
Activations Density 0.029%