INDEX
Explanations
mentions of research studies and scientific findings
New Auto-Interp
Negative Logits
shaw
-0.73
warr
-0.72
dunno
-0.68
scribe
-0.67
improves
-0.66
violates
-0.65
recovers
-0.64
heit
-0.64
ends
-0.63
acy
-0.62
POSITIVE LOGITS
recent
0.83
recently
0.76
unveiling
0.74
glimps
0.73
vividly
0.71
Exhibit
0.70
recent
0.70
revelations
0.69
excerpts
0.68
tales
0.68
Activations Density 2.648%