INDEX
Explanations
phrases expressing beliefs or opinions
assertions of belief or conviction
New Auto-Interp
Negative Logits
cloth
-0.71
aste
-0.71
details
-0.66
Household
-0.64
ham
-0.62
Navigation
-0.62
Pavilion
-0.61
effect
-0.61
abb
-0.61
Appearance
-0.61
POSITIVE LOGITS
passionately
0.82
ħ
0.78
§
0.75
phas
0.73
compelled
0.70
POSE
0.70
believe
0.69
fully
0.68
ieve
0.67
rica
0.67
Activations Density 0.044%