INDEX
Explanations
phrases indicating lack of knowledge or information
expressions of uncertainty or a lack of knowledge
New Auto-Interp
Negative Logits
don
-0.71
eder
-0.67
eneg
-0.67
aud
-0.67
eworthy
-0.66
ILCS
-0.66
ternity
-0.66
gar
-0.63
eding
-0.62
eded
-0.62
POSITIVE LOGITS
idea
1.14
ually
1.13
Idea
0.86
concept
0.86
moot
0.82
Ideas
0.82
ideas
0.77
premise
0.76
̶
0.74
proposition
0.73
Activations Density 0.018%