INDEX
Explanations
statements of strong opinions or threats
concepts related to societal issues and moral dilemmas
New Auto-Interp
Negative Logits
odder
-0.68
Scholar
-0.63
Horizons
-0.62
Sail
-0.61
Merchants
-0.60
Seller
-0.59
aughters
-0.58
merce
-0.58
Worlds
-0.57
Moons
-0.57
POSITIVE LOGITS
ãĥ»
0.77
âĶĢâĶĢ
0.72
"""
0.71
-"
0.68
-----
0.66
Õ
0.65
Ñģ
0.63
"""
0.63
ilib
0.63
*****
0.62
Activations Density 0.178%