INDEX
Explanations
phrases related to knowledge, power, and influence
terms related to psychological conditions and complex concepts
New Auto-Interp
Negative Logits
izont
-0.49
veter
-0.49
ificantly
-0.49
ensibly
-0.47
concess
-0.43
itialized
-0.43
yss
-0.43
oret
-0.43
thous
-0.43
privately
-0.42
POSITIVE LOGITS
.[
1.20
.
1.16
*.
1.05
.(
1.01
+.
0.99
.).
0.97
.</
0.97
.*
0.96
."[
0.95
'.
0.95
Activations Density 1.378%