INDEX
Explanations
names or terms related to specific individuals
proper nouns and named entities
New Auto-Interp
Negative Logits
Ten
-0.83
Truman
-0.82
Morton
-0.78
Monarch
-0.78
142
-0.77
KR
-0.74
Sonny
-0.73
WP
-0.72
Ross
-0.72
Ten
-0.71
POSITIVE LOGITS
ig
1.85
IG
1.69
igs
1.67
igg
1.31
Zig
1.25
IG
1.23
Mig
1.22
rig
1.21
igraph
1.20
igi
1.19
Activations Density 0.369%