INDEX
Explanations
questions about the truth or nature of statements made
questions that challenge assumptions or beliefs
New Auto-Interp
Negative Logits
bies
-0.90
dit
-0.79
usters
-0.77
tions
-0.73
papers
-0.71
Topics
-0.71
umbn
-0.69
ixels
-0.69
ventures
-0.68
former
-0.67
POSITIVE LOGITS
conceivable
1.02
really
0.95
worth
0.91
possible
0.89
Possible
0.88
ever
0.87
Really
0.85
worthwhile
0.83
feasible
0.83
REALLY
0.83
Activations Density 0.057%