INDEX
Explanations
proper nouns
text or phrases that suggest a threatening or negative context
New Auto-Interp
Negative Logits
Berm
-0.56
Nicarag
-0.55
Meter
-0.54
Olympia
-0.53
Wilmington
-0.51
Pegasus
-0.49
rival
-0.48
Valhalla
-0.48
Western
-0.48
Pan
-0.48
POSITIVE LOGITS
ï¸ı
1.44
then
1.00
âĢ
0.94
why
0.91
thats
0.85
ðŁ
0.85
#
0.83
ðŁĺ
0.83
please
0.83
ï¸
0.81
Activations Density 0.308%