INDEX
Explanations
names of individuals and their colleagues
references to specific individuals or researchers in scientific contexts
New Auto-Interp
Negative Logits
Interstitial
-0.83
venge
-0.75
interstitial
-0.74
treason
-0.72
oath
-0.71
punishable
-0.70
Appearances
-0.68
stab
-0.68
bombed
-0.67
thumbnails
-0.67
POSITIVE LOGITS
researcher
1.01
Aviv
0.99
Krish
0.94
Heather
0.91
Melanie
0.90
biologist
0.90
Karin
0.90
Zhu
0.90
Zhang
0.89
earcher
0.89
Activations Density 0.931%