INDEX
Explanations
phrases indicating novelty or surprise
phrases indicating experiences of novelty or uniqueness
New Auto-Interp
Negative Logits
Tips
-0.70
lear
-0.67
reg
-0.65
ŃĶ
-0.64
fund
-0.63
supp
-0.62
haus
-0.61
absor
-0.61
ERT
-0.61
relations
-0.60
POSITIVE LOGITS
anything
0.95
anybody
0.88
anyone
0.81
daylight
0.77
ANY
0.71
anywhere
0.70
dime
0.70
them
0.68
nor
0.68
him
0.67
Activations Density 0.070%