INDEX
Explanations
information related to events or actions that have caused outrage or concern in a community
New Auto-Interp
Negative Logits
tyr
-0.79
Brach
-0.73
millenn
-0.61
Mock
-0.60
0004
-0.58
Inqu
-0.58
arsen
-0.57
avorite
-0.56
Mub
-0.56
etched
-0.55
POSITIVE LOGITS
stretched
1.33
fitted
1.21
casts
1.14
dated
1.14
smart
1.13
doors
1.12
skirts
1.12
lier
1.09
fitting
1.07
lying
1.07
Activations Density 0.048%