INDEX
Explanations
specific numbers or codes
numerical identifiers and class designations
New Auto-Interp
Negative Logits
unaff
-0.64
plac
-0.62
anooga
-0.61
simulac
-0.56
unequiv
-0.56
disastrous
-0.55
relentless
-0.55
ographers
-0.55
Bout
-0.55
unwelcome
-0.54
POSITIVE LOGITS
58
1.51
79
1.51
84
1.51
70
1.50
81
1.50
57
1.48
76
1.47
59
1.47
88
1.47
71
1.47
Activations Density 0.188%