INDEX
Explanations
significant numerical quantities or ratios
New Auto-Interp
Negative Logits
badass
-0.23
freaking
-0.18
impactful
-0.18
transitioning
-0.17
policymakers
-0.17
transformative
-0.17
-focused
-0.16
-esque
-0.16
nuanced
-0.16
policym
-0.15
POSITIVE LOGITS
:-↵
0.23
..........
0.22
........
0.21
................
0.20
:-
0.19
viz
0.19
,(
0.19
â̦â̦â̦â̦
0.19
!!
0.18
:-
0.18
Activations Density 0.808%