INDEX
Explanations
occurrences of specific quantitative or evaluative terms related to assessments or judgments
New Auto-Interp
Negative Logits
?),
-0.18
?-
-0.18
zer
-0.17
?):
-0.15
');</
-0.14
Balt
-0.14
feud
-0.14
venida
-0.14
Rodrig
-0.14
hone
-0.14
POSITIVE LOGITS
)?
0.55
)?↵
0.52
)?↵↵
0.49
]?
0.48
”?
0.48
"?
0.47
"?↵↵
0.46
))?
0.44
}?
0.39
'?
0.38
Activations Density 0.085%