INDEX
Explanations
statements related to criticism or analysis of official reports or documents
New Auto-Interp
Negative Logits
Enlarge
-0.69
window
-0.68
folk
-0.66
angular
-0.66
ãĥİ
-0.64
brance
-0.63
Copy
-0.62
mt
-0.62
etric
-0.61
immune
-0.60
POSITIVE LOGITS
sarcast
1.11
bluntly
1.02
emphatically
0.95
rhet
0.93
omin
0.90
noting
0.89
:"
0.88
passionately
0.87
confidently
0.85
secondly
0.83
Activations Density 1.014%