INDEX
Explanations
names of people as well as terms related to news and events
letters and possibly proper nouns or acronyms within the text
New Auto-Interp
Negative Logits
Generations
-0.69
proxies
-0.64
envy
-0.63
presidents
-0.62
ancest
-0.61
plague
-0.61
capsules
-0.59
coordinates
-0.58
dinosaurs
-0.58
TPP
-0.58
POSITIVE LOGITS
awi
1.04
ava
0.93
adin
0.86
iday
0.85
aj
0.85
ouri
0.83
ady
0.82
oub
0.82
ieri
0.82
ibel
0.82
Activations Density 0.109%