INDEX
Explanations
numerical data and group classifications
New Auto-Interp
Negative Logits
Garc
-0.72
ped
-0.71
ãĥīãĥ©
-0.70
writers
-0.69
Cro
-0.69
respons
-0.68
PO
-0.64
rote
-0.64
Huss
-0.62
Zo
-0.61
POSITIVE LOGITS
Ĵ
0.69
âĶĢâĶĢâĶĢâĶĢ
0.69
âĹı
0.69
===
0.69
®
0.65
ãĢij
0.65
çͰ
0.64
Ĩ
0.61
²
0.60
ãĤ¤ãĥĪ
0.60
Activations Density 0.039%