INDEX
Explanations
phrases related to political and social issues
New Auto-Interp
Negative Logits
shape
-0.80
habit
-0.78
carrier
-0.76
salv
-0.73
barg
-0.73
downs
-0.72
likeness
-0.72
pharmacy
-0.72
nude
-0.72
disguise
-0.71
POSITIVE LOGITS
ï¸ı
1.19
ï¸
1.06
Similarly
1.02
Additionally
1.02
Unfortunately
1.01
However
0.99
Therefore
0.98
However
0.98
Likewise
0.95
Obviously
0.94
Activations Density 0.465%