INDEX
Explanations
phrases indicating personal beliefs or opinions
statements of belief or conviction
New Auto-Interp
Negative Logits
cloth
-0.75
insert
-0.70
conservancy
-0.70
mentioned
-0.68
practice
-0.67
yna
-0.67
aste
-0.66
effect
-0.64
umber
-0.62
nice
-0.61
POSITIVE LOGITS
orean
0.76
believe
0.74
orea
0.73
ieve
0.73
believes
0.70
ij士
0.70
phas
0.70
rill
0.69
orians
0.68
POSE
0.66
Activations Density 0.041%