INDEX
Explanations
names and mentions of individuals in various contexts
New Auto-Interp
Negative Logits
itſelf
-0.88
pleaſure
-0.82
ſever
-0.82
Majefty
-0.81
purpoſe
-0.81
tagHelperRunner
-0.79
neſs
-0.79
متعلقه
-0.78
themſelves
-0.77
myſelf
-0.74
POSITIVE LOGITS
N
0.48
…
0.42
..
0.41
Phương
0.41
N
0.40
brand
0.39
...
0.39
有名
0.39
T
0.38
飾
0.38
Activations Density 0.379%