INDEX
Explanations
verbs or phrases related to negative or problematic situations
negative outcomes or failures
New Auto-Interp
Negative Logits
Dome
-0.65
ilion
-0.62
aurus
-0.58
pursu
-0.58
Tart
-0.58
Demand
-0.58
Sapp
-0.55
zo
-0.54
Advertisements
-0.53
Hok
-0.53
POSITIVE LOGITS
Ĥİ
0.92
ŃĶ
0.80
¿½
0.78
osate
0.76
inka
0.70
ĸļ
0.67
Ĥª
0.67
glers
0.66
¨
0.66
partName
0.66
Activations Density 0.610%