INDEX
Explanations
mentions of critiquing or criticism
references or mentions of critiques and criticism
New Auto-Interp
Negative Logits
orthy
-0.76
loo
-0.69
Aad
-0.69
velt
-0.67
Mandela
-0.64
Patent
-0.64
Permanent
-0.63
noon
-0.63
Petty
-0.62
Argent
-0.61
POSITIVE LOGITS
erion
1.45
ique
1.34
iques
1.30
icism
1.23
iqu
1.17
eria
1.06
ters
1.00
ically
0.99
ter
0.90
icals
0.87
Activations Density 0.041%