INDEX
Explanations
references to examples and instances in discussions or arguments
introducing an example
New Auto-Interp
Negative Logits
ſta
-0.68
houſe
-0.61
faſt
-0.57
Houſe
-0.55
purpoſe
-0.54
ſelf
-0.53
Majefty
-0.53
myſelf
-0.52
ſch
-0.52
pleaſure
-0.52
POSITIVE LOGITS
egregious
0.47
instances
0.43
actitudes
0.42
irritating
0.39
infuriating
0.39
incidents
0.39
situation
0.38
fatto
0.38
situations
0.37
たとえば
0.37
Activations Density 0.100%