INDEX
Explanations
words related to levels of accuracy or correctness
New Auto-Interp
Negative Logits
olan
-0.74
orer
-0.72
ogi
-0.70
lessness
-0.69
uments
-0.67
UCT
-0.67
uty
-0.67
Highly
-0.66
no
-0.65
licts
-0.65
POSITIVE LOGITS
anymore
0.81
Enough
0.76
icable
0.74
fit
0.72
ifiable
0.70
enough
0.70
reconcil
0.68
finished
0.67
bothered
0.67
spoon
0.67
Activations Density 0.059%