INDEX
Explanations
information related to a specific person such as their achievements, occupations, personal wealth, and family members
mentions of a specific male subject or protagonist
New Auto-Interp
Negative Logits
xxx
-0.77
ÃĹ
-0.74
âī
-0.74
âľ
-0.73
̶
-0.72
����
-0.72
—-
-0.72
DN
-0.71
poke
-0.70
ÏĢ
-0.70
POSITIVE LOGITS
biggest
1.03
inability
1.03
detractors
1.01
successor
0.99
itage
0.99
goal
0.99
willingness
0.98
Majesty
0.97
newfound
0.96
youngest
0.96
Activations Density 0.142%