INDEX
Explanations
names of famous people, political entities, and contentious topics
New Auto-Interp
Negative Logits
assets
-0.86
effic
-0.85
gran
-0.83
proportions
-0.79
tips
-0.75
vulnerabilities
-0.72
efficiency
-0.71
generated
-0.70
nutrition
-0.70
ÅĤ
-0.70
POSITIVE LOGITS
fray
1.75
bandwagon
1.25
chorus
1.20
ranks
1.01
fellowship
0.89
fold
0.89
rocal
0.81
conversation
0.80
CF
0.80
neau
0.79
Activations Density 12.719%