INDEX
Explanations
comparisons between different types of variables or entities
comparisons between different groups or entities
New Auto-Interp
Negative Logits
Pacific
-0.58
achine
-0.58
revenge
-0.56
oval
-0.56
FI
-0.55
eco
-0.55
Nikki
-0.54
retro
-0.54
immunity
-0.54
Carly
-0.53
POSITIVE LOGITS
counterparts
0.90
().
0.83
anymore
0.77
pees
0.75
*.
0.74
attRot
0.73
+.
0.72
existed
0.71
nor
0.71
$.
0.70
Activations Density 0.422%