INDEX
Explanations
mathematical symbols and related text
phrases related to negative consequences or effects
New Auto-Interp
Negative Logits
guiActiveUnfocused
-0.73
Tid
-0.70
scatter
-0.66
Dirt
-0.63
Libyan
-0.61
FAR
-0.60
Golem
-0.59
FAT
-0.59
Belg
-0.57
subur
-0.57
POSITIVE LOGITS
should
0.87
¹
0.85
-|
0.84
¢
0.83
could
0.83
§
0.79
£
0.79
catentry
0.78
âĢķ
0.78
there
0.78
Activations Density 0.575%