INDEX
Explanations
phrases related to negative situations or critical viewpoints
New Auto-Interp
Negative Logits
roma
-0.69
ellen
-0.67
audi
-0.64
entirety
-0.62
ofer
-0.61
ibia
-0.61
ighth
-0.59
ij士
-0.58
Disk
-0.58
orney
-0.57
POSITIVE LOGITS
sidx
0.83
traction
0.77
quicker
0.75
quished
0.73
retty
0.73
quick
0.71
quickly
0.70
faster
0.70
ipolar
0.70
ãĤ¼
0.68
Activations Density 0.221%