INDEX
Explanations
negative phrases or concepts associated with technology
New Auto-Interp
Negative Logits
polation
-0.93
iffance
-0.93
Efq
-0.91
ollary
-0.91
arote
-0.91
Theſe
-0.88
ratulations
-0.87
ousand
-0.86
BibitemShut
-0.85
encils
-0.84
POSITIVE LOGITS
-
1.01
out
0.60
al
0.57
ISupport
0.54
M
0.54
ri
0.54
ie
0.54
_
0.53
T
0.52
re
0.52
Activations Density 0.318%