INDEX
Explanations
complex social and political themes in textual content
New Auto-Interp
Negative Logits
nonetheless
-0.76
fortunately
-0.74
accordingly
-0.73
thereafter
-0.73
!.
-0.61
Pwr
-0.60
.[
-0.60
anwhile
-0.58
."[
-0.58
.).
-0.58
POSITIVE LOGITS
virginity
0.62
?",
0.61
thood
0.60
wealth
0.57
gor
0.57
estamp
0.56
animate
0.56
zac
0.55
jerk
0.55
Guant
0.55
Activations Density 5.397%