INDEX
Explanations
phrases or sentences ending with a period
sentences or phrases indicating conclusions or summaries
New Auto-Interp
Negative Logits
manif
-0.77
himself
-0.72
lled
-0.68
lier
-0.68
hammered
-0.67
emic
-0.67
contingency
-0.66
iste
-0.65
conservatism
-0.65
criminal
-0.65
POSITIVE LOGITS
Especially
1.12
Whether
1.04
Please
1.02
Specifically
1.01
Depending
1.01
Besides
0.99
Including
0.98
Learn
0.97
Includes
0.97
Unfortunately
0.96
Activations Density 0.503%