INDEX
Explanations
personal pronouns followed by possessive pronouns
references to a specific individual or subject
New Auto-Interp
Negative Logits
Ò
-0.79
OTA
-0.75
DN
-0.75
PLA
-0.72
����
-0.70
2500
-0.70
âī
-0.70
—-
-0.67
1200
-0.67
ÏĢ
-0.67
POSITIVE LOGITS
inability
0.96
biggest
0.91
itage
0.90
ths
0.87
goal
0.87
Majesty
0.86
opinions
0.85
willingness
0.85
self
0.82
favourite
0.82
Activations Density 0.151%