INDEX
Explanations
mentions of negative experiences or events
terms related to systematic oppression and political manipulation
New Auto-Interp
Negative Logits
usercontent
-0.52
laws
-0.51
ramid
-0.50
Ire
-0.49
ufact
-0.48
":[
-0.46
regul
-0.46
adolesc
-0.46
doors
-0.46
]),
-0.46
POSITIVE LOGITS
onge
0.54
Description
0.52
·
0.51
[/
0.51
escription
0.51
âľ
0.50
DragonMagazine
0.50
START
0.48
inka
0.47
};
0.47
Activations Density 1.806%