INDEX
Explanations
words and phrases related to quality and texture
New Auto-Interp
Negative Logits
friends
-0.89
laws
-0.84
aunts
-0.78
rules
-0.75
forums
-0.74
ources
-0.73
hops
-0.73
andals
-0.72
Trees
-0.72
posts
-0.72
POSITIVE LOGITS
outcome
0.97
dose
0.94
impression
0.92
amount
0.91
foothold
0.90
effect
0.88
advantage
0.88
approximation
0.87
atmosphere
0.87
edge
0.87
Activations Density 0.157%