INDEX
Explanations
statements of opinion or assertions
New Auto-Interp
Negative Logits
figure
-0.70
pes
-0.69
velength
-0.65
mathemat
-0.64
ourses
-0.61
kefeller
-0.61
focal
-0.61
obser
-0.60
awar
-0.60
imeters
-0.58
POSITIVE LOGITS
goodbye
1.06
hello
0.81
unequivocally
0.68
definitively
0.68
nobody
0.68
rists
0.66
there
0.66
that
0.65
confidently
0.64
âĶĢâĶĢâĶĢâĶĢ
0.64
Activations Density 0.022%