INDEX
Explanations
specific phrases indicating support for certain political actions or ideas
New Auto-Interp
Negative Logits
ĸļ
-0.76
ontent
-0.75
reflect
-0.72
matched
-0.70
esters
-0.68
ngth
-0.68
days
-0.67
holes
-0.66
owing
-0.65
answered
-0.65
POSITIVE LOGITS
idea
1.57
notion
1.34
proposition
1.23
proposal
1.20
candidacy
1.12
inclusion
1.11
legalization
1.04
creation
1.02
concept
1.01
thesis
0.97
Activations Density 0.232%