INDEX
Explanations
words related to legal consequences or conditions
New Auto-Interp
Negative Logits
iffe
-0.76
isoft
-0.67
imet
-0.61
enic
-0.61
Carbuncle
-0.61
ortun
-0.60
Neighbor
-0.60
ãĤ¼
-0.59
anqu
-0.59
ainted
-0.58
POSITIVE LOGITS
unequivocally
1.07
bluntly
0.99
emphatically
0.92
plainly
0.91
goodbye
0.79
categor
0.75
boldly
0.74
quo
0.74
"...
0.74
confidently
0.73
Activations Density 0.011%