INDEX
Explanations
phrases related to bias in various contexts
New Auto-Interp
Negative Logits
Baltimore
-0.21
Kansas
-0.18
Kansas
-0.17
Worcester
-0.17
Nancy
-0.16
Maryland
-0.15
Arkansas
-0.15
Lowell
-0.15
illac
-0.15
Massachusetts
-0.15
POSITIVE LOGITS
Titan
0.39
Titans
0.39
titan
0.37
tit
0.31
Titan
0.29
Attack
0.28
Tit
0.28
Tit
0.27
Levi
0.27
Survey
0.24
Activations Density 0.002%