INDEX
Explanations
proper nouns like people's names and place names
references to time periods or durations within narratives
New Auto-Interp
Negative Logits
IPM
-0.60
Powered
-0.58
Crunch
-0.54
overload
-0.54
0004
-0.53
Grizz
-0.53
giveaways
-0.53
Decay
-0.52
manufactures
-0.52
incentiv
-0.52
POSITIVE LOGITS
çͰ
0.80
married
0.76
æµ
0.75
thood
0.70
fame
0.66
ukemia
0.63
è£
0.60
bart
0.60
ع
0.60
Himself
0.60
Activations Density 1.831%