INDEX
Explanations
phrases indicating uncertainty or hesitation
New Auto-Interp
Negative Logits
ãĥŀ
-0.57
abst
-0.54
depreciation
-0.54
REDACTED
-0.54
sow
-0.53
1886
-0.51
ol
-0.51
rupture
-0.50
coincide
-0.50
avg
-0.48
POSITIVE LOGITS
pering
0.74
borgh
0.73
thing
0.65
lins
0.65
paste
0.65
forward
0.64
tree
0.63
links
0.62
ston
0.62
emade
0.61
Activations Density 5.837%