INDEX
Explanations
abbreviations or titles followed by a numerical value and a statement
statements or sections that present factual information
New Auto-Interp
Negative Logits
Klux
-0.85
ctic
-0.76
isoft
-0.75
avorite
-0.70
prus
-0.68
avy
-0.67
yss
-0.67
xit
-0.66
itsch
-0.65
JV
-0.65
POSITIVE LOGITS
ually
1.18
orial
1.08
ional
1.08
oids
0.99
itious
0.95
ially
0.92
oid
0.89
ored
0.86
Fact
0.85
ual
0.85
Activations Density 0.027%