INDEX
Explanations
phrases centered around questioning legitimacy and authority
importance/truth
the validity/legitimacy
New Auto-Interp
Negative Logits
hint
-0.53
никак
-0.48
réflexion
-0.48
Increment
-0.47
saja
-0.47
diatur
-0.46
geber
-0.46
dif
-0.46
increment
-0.46
discussion
-0.45
POSITIVE LOGITS
validity
1.67
validity
1.30
legitimacy
1.29
effectiveness
1.25
veracity
1.21
suitability
1.20
authenticity
1.19
accuracy
1.19
correctness
1.19
adequacy
1.16
Activations Density 0.625%