INDEX
Explanations
student names
tokens that represent identifiers or potential data points
New Auto-Interp
Negative Logits
Stew
-0.80
disg
-0.72
wolves
-0.68
seiz
-0.67
eleph
-0.67
Mall
-0.67
microw
-0.66
Insp
-0.66
133
-0.66
Sally
-0.64
POSITIVE LOGITS
b
1.19
obar
1.00
Bib
0.97
bis
0.97
bs
0.95
bish
0.95
bar
0.94
bin
0.92
B
0.92
baugh
0.91
Activations Density 0.178%