INDEX
Explanations
phrases related to news headlines and articles
New Auto-Interp
Negative Logits
chuckle
-0.42
Architects
-0.42
defamation
-0.42
fixme
-0.41
jokes
-0.41
fallacy
-0.41
Schn
-0.41
earthquakes
-0.40
patented
-0.40
)?
-0.40
POSITIVE LOGITS
»
0.74
·
0.70
¹
0.66
¸
0.64
=~=~
0.64
¯
0.64
®
0.64
IJ
0.64
vice
0.64
ī
0.62
Activations Density 1.370%