INDEX
Explanations
references to interviews
references to interviews
New Auto-Interp
Negative Logits
math
-0.74
mil
-0.73
rejoice
-0.68
blue
-0.66
borgh
-0.65
DIV
-0.62
bn
-0.61
axy
-0.59
wikipedia
-0.58
Flavoring
-0.58
POSITIVE LOGITS
ees
1.33
ee
1.15
conducted
1.04
interview
0.89
transcripts
0.88
subjects
0.87
ioned
0.84
interviews
0.84
Transcript
0.81
transcript
0.80
Activations Density 0.059%