INDEX
Explanations
words related to fire or physical locations
terms related to quality and authenticity
New Auto-Interp
Negative Logits
Rush
-0.65
Seym
-0.64
Spy
-0.62
Prescott
-0.62
Snake
-0.62
Nanto
-0.61
pta
-0.61
KT
-0.60
kson
-0.60
enza
-0.59
POSITIVE LOGITS
bon
1.04
fide
1.00
uit
0.94
iously
0.93
bon
0.89
eless
0.87
nets
0.85
bat
0.81
ificial
0.81
uin
0.81
Activations Density 0.015%