INDEX
Explanations
words related to historical events and names
connections between thoughts and experiences
New Auto-Interp
Negative Logits
.[
-0.81
.
-0.77
.–
-0.76
;
-0.76
.","
-0.72
.''.
-0.72
ãĢĤ
-0.68
."[
-0.68
."
-0.68
.''
-0.67
POSITIVE LOGITS
ctive
0.55
isations
0.47
izens
0.46
Scotland
0.43
ctory
0.43
ws
0.43
Sundays
0.43
ifts
0.43
Fulton
0.43
izations
0.42
Activations Density 2.128%